diff mbox

x86/mm: Add x86 valid_phys_addr_range() for /dev/mem

Message ID 1455069975-14291-1-git-send-email-toshi.kani@hpe.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kani, Toshi Feb. 10, 2016, 2:06 a.m. UTC
x86 does not define ARCH_HAS_VALID_PHYS_ADDR_RANGE, which
leads /dev/mem to use the default valid_phys_addr_range()
and valid_mmap_phys_addr_range() in drivers/char/mem.c.

The default valid_phys_addr_range() allows any range lower
than __pa(high_memory), which is the end of system RAM, and
disallows any range higher than it.

Persistent memory may be located at lower and/or higher
address of __pa(high_memory) depending on their memory slots.
When using crash(8) via /dev/mem for analyzing data in
persistent memory, it can only access to the one lower than
__pa(high_memory).

Add x86 valid_phys_addr_range() and valid_mmap_phys_addr_range()
to provide better checking:
 - Physical address range is valid when it is fully backed by
   IORESOURCE_MEM, regardless of __pa(high_memory).
 - Other ranges, including holes, are invalid.

This also allows crash(8) to access persistent memory ranges
via /dev/mem (with a minor change to remove high_memory check
from crash itself).

Note, /dev/mem makes additional check with devmem_is_allowed()
for read/write when CONFIG_STRICT_DEVMEM is set, and does always
for mmap.  CONFIG_IO_STRICT_DEVMEM provides further restriction.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
---
This patch applies on top of the patch-set below, and is based
on the tip tree.
https://lkml.org/lkml/2016/1/26/886
---
 arch/x86/include/asm/io.h |    3 +++
 arch/x86/mm/init.c        |   24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

Comments

Ingo Molnar Feb. 17, 2016, 9:29 a.m. UTC | #1
* Toshi Kani <toshi.kani@hpe.com> wrote:

> x86 does not define ARCH_HAS_VALID_PHYS_ADDR_RANGE, which
> leads /dev/mem to use the default valid_phys_addr_range()
> and valid_mmap_phys_addr_range() in drivers/char/mem.c.
> 
> The default valid_phys_addr_range() allows any range lower
> than __pa(high_memory), which is the end of system RAM, and
> disallows any range higher than it.
> 
> Persistent memory may be located at lower and/or higher
> address of __pa(high_memory) depending on their memory slots.
> When using crash(8) via /dev/mem for analyzing data in
> persistent memory, it can only access to the one lower than
> __pa(high_memory).
> 
> Add x86 valid_phys_addr_range() and valid_mmap_phys_addr_range()
> to provide better checking:
>  - Physical address range is valid when it is fully backed by
>    IORESOURCE_MEM, regardless of __pa(high_memory).
>  - Other ranges, including holes, are invalid.
> 
> This also allows crash(8) to access persistent memory ranges
> via /dev/mem (with a minor change to remove high_memory check
> from crash itself).
> 
> Note, /dev/mem makes additional check with devmem_is_allowed()
> for read/write when CONFIG_STRICT_DEVMEM is set, and does always
> for mmap.  CONFIG_IO_STRICT_DEVMEM provides further restriction.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> ---
> This patch applies on top of the patch-set below, and is based
> on the tip tree.
> https://lkml.org/lkml/2016/1/26/886
> ---
>  arch/x86/include/asm/io.h |    3 +++
>  arch/x86/mm/init.c        |   24 ++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index de25aad..189901a 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -36,6 +36,7 @@
>  
>  #define ARCH_HAS_IOREMAP_WC
>  #define ARCH_HAS_IOREMAP_WT
> +#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
>  
>  #include <linux/string.h>
>  #include <linux/compiler.h>
> @@ -326,6 +327,8 @@ extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
>  extern void __iomem *ioremap_wt(resource_size_t offset, unsigned long size);
>  
>  extern bool is_early_ioremap_ptep(pte_t *ptep);
> +extern int valid_phys_addr_range(phys_addr_t addr, size_t size);
> +extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t size);
>  
>  #ifdef CONFIG_XEN
>  #include <xen/xen.h>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 493f541..35cf96f 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -624,6 +624,30 @@ void __init init_mem_mapping(void)
>  	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
>  }
>  
> +/**
> + * valid_phys_addr_range - check phys addr for /dev/mem read and write
> + *
> + * Return true if a target physical address is marked as IORESOURCE_MEM.
> + */
> +int valid_phys_addr_range(phys_addr_t addr, size_t size)
> +{
> +	return (region_intersects(addr, size, IORESOURCE_MEM,
> +				  IORES_DESC_NONE) == REGION_INTERSECTS);
> +}
> +
> +/**
> + * valid_mmap_phys_addr_range - check phys addr for /dev/mem mmap
> + *
> + * Return true if a target physical address is marked as IORESOURCE_MEM.
> + */
> +int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)
> +{
> +	resource_size_t addr = pfn << PAGE_SHIFT;
> +
> +	return (region_intersects(addr, size, IORESOURCE_MEM,
> +				  IORES_DESC_NONE) == REGION_INTERSECTS);
> +}
> +
>  /*
>   * devmem_is_allowed() checks to see if /dev/mem access to a certain address
>   * is valid. The argument is a physical page number.

So it's hard to judge the quality of these new APIs without seeing their actual 
usecases. So please Cc: me to whatever work this is used in, and I'll have a look 
in that context.

Thanks,

	Ingo
Kani, Toshi Feb. 17, 2016, 4:58 p.m. UTC | #2
On Wed, 2016-02-17 at 10:29 +0100, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hpe.com> wrote:
> 
> > x86 does not define ARCH_HAS_VALID_PHYS_ADDR_RANGE, which
> > leads /dev/mem to use the default valid_phys_addr_range()
> > and valid_mmap_phys_addr_range() in drivers/char/mem.c.
> > 
> > The default valid_phys_addr_range() allows any range lower
> > than __pa(high_memory), which is the end of system RAM, and
> > disallows any range higher than it.
> > 
> > Persistent memory may be located at lower and/or higher
> > address of __pa(high_memory) depending on their memory slots.
> > When using crash(8) via /dev/mem for analyzing data in
> > persistent memory, it can only access to the one lower than
> > __pa(high_memory).
> > 
> > Add x86 valid_phys_addr_range() and valid_mmap_phys_addr_range()
> > to provide better checking:
> >  - Physical address range is valid when it is fully backed by
> >    IORESOURCE_MEM, regardless of __pa(high_memory).
> >  - Other ranges, including holes, are invalid.
> > 
> > This also allows crash(8) to access persistent memory ranges
> > via /dev/mem (with a minor change to remove high_memory check
> > from crash itself).
> > 
> > Note, /dev/mem makes additional check with devmem_is_allowed()
> > for read/write when CONFIG_STRICT_DEVMEM is set, and does always
> > for mmap.  CONFIG_IO_STRICT_DEVMEM provides further restriction.
 :
> So it's hard to judge the quality of these new APIs without seeing their
> actual usecases. So please Cc: me to whatever work this is used in, and
> I'll have a look in that context.

Great!  The source code of crash(8) is available in the github below.
https://github.com/crash-utility/crash

crash is a tool to analyze memory data via a crashdump file or /dev/mem.  I
am trying to make this tool works on NVDIMM via /dev/mem. (NVDIMM ranges
are not saved to a crashdump file, but are persistent anyway.)  We had BTT
metadata corruptions, and this tool can be helpful to verify such data
until we have better tools.
https://lkml.org/lkml/2016/2/11/674

When /dev/mem is specified as the source, read_dev_mem() is set to pc-
>readmem as the read function to /dev/mem.  read_dev_mem() is at line 2268
of the file blow.
https://github.com/crash-utility/crash/blob/master/memory.c

read_dev_mem() has the same high_memory check (i.e. mirroring the /dev/mem
restriction), which should also be removed after this patch is accepted.
 Falling back to /dev/kmem does not help since read_kmem() in the 
kernel has check to high_memory in the function itself.  With this patch,
read_dev_mem() works on NVDIMM ranges.

Let me know if you have any question.
Thanks,
-Toshi
Dan Williams Feb. 17, 2016, 5:58 p.m. UTC | #3
On Tue, Feb 9, 2016 at 6:06 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> x86 does not define ARCH_HAS_VALID_PHYS_ADDR_RANGE, which
> leads /dev/mem to use the default valid_phys_addr_range()
> and valid_mmap_phys_addr_range() in drivers/char/mem.c.
>
> The default valid_phys_addr_range() allows any range lower
> than __pa(high_memory), which is the end of system RAM, and
> disallows any range higher than it.
>
> Persistent memory may be located at lower and/or higher
> address of __pa(high_memory) depending on their memory slots.
> When using crash(8) via /dev/mem for analyzing data in
> persistent memory, it can only access to the one lower than
> __pa(high_memory).
>
> Add x86 valid_phys_addr_range() and valid_mmap_phys_addr_range()
> to provide better checking:
>  - Physical address range is valid when it is fully backed by
>    IORESOURCE_MEM, regardless of __pa(high_memory).
>  - Other ranges, including holes, are invalid.
>
> This also allows crash(8) to access persistent memory ranges
> via /dev/mem (with a minor change to remove high_memory check
> from crash itself).

If we're modifying crash(8) can't we also teach it to mmap /dev/pmemX
directly?  With commit 90a545e98126 "restrict /dev/mem to idle io
memory ranges" /dev/mem should not have access to active pmem ranges.
Dan Williams Feb. 17, 2016, 7:05 p.m. UTC | #4
On Wed, Feb 17, 2016 at 11:35 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> On Wed, 2016-02-17 at 09:58 -0800, Dan Williams wrote:
>> On Tue, Feb 9, 2016 at 6:06 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
>> > x86 does not define ARCH_HAS_VALID_PHYS_ADDR_RANGE, which
>> > leads /dev/mem to use the default valid_phys_addr_range()
>> > and valid_mmap_phys_addr_range() in drivers/char/mem.c.
>> >
>> > The default valid_phys_addr_range() allows any range lower
>> > than __pa(high_memory), which is the end of system RAM, and
>> > disallows any range higher than it.
>> >
>> > Persistent memory may be located at lower and/or higher
>> > address of __pa(high_memory) depending on their memory slots.
>> > When using crash(8) via /dev/mem for analyzing data in
>> > persistent memory, it can only access to the one lower than
>> > __pa(high_memory).
>> >
>> > Add x86 valid_phys_addr_range() and valid_mmap_phys_addr_range()
>> > to provide better checking:
>> >  - Physical address range is valid when it is fully backed by
>> >    IORESOURCE_MEM, regardless of __pa(high_memory).
>> >  - Other ranges, including holes, are invalid.
>> >
>> > This also allows crash(8) to access persistent memory ranges
>> > via /dev/mem (with a minor change to remove high_memory check
>> > from crash itself).
>>
>> If we're modifying crash(8) can't we also teach it to mmap /dev/pmemX
>> directly?  With commit 90a545e98126 "restrict /dev/mem to idle io
>> memory ranges" /dev/mem should not have access to active pmem ranges.
>
> Yes, I am aware of the commit.  Unloading drivers while using crash(8) to
> analyze NVDIMM via /dev/mem makes sense.  /dev/mem does not require any
> other drivers be loaded.

Ah, ok.  I thought this patch was bypassing that safety check.  If it
requires the driver to be unloaded first then I'm fine with this.

> Using /dev/pmemX, on the other hand, requires the driver to be loaded,
> which can be problematic.  For instance, when btt_init() fails due to some
> corruption in arena, it fails to create any pmem device file.  A dev file
> also restricts access range within the dev file.
>
> Thanks,
> -Toshi
>
> ps.
> Looking at iomem_is_exclusive(), it only checks the top-level iomem
> entries.  I think the pmem/btt driver only marks a child entry busy...
>

It looks to me that next_resource(), via r_next(), walks child ranges.
Kani, Toshi Feb. 17, 2016, 7:35 p.m. UTC | #5
On Wed, 2016-02-17 at 09:58 -0800, Dan Williams wrote:
> On Tue, Feb 9, 2016 at 6:06 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > x86 does not define ARCH_HAS_VALID_PHYS_ADDR_RANGE, which
> > leads /dev/mem to use the default valid_phys_addr_range()
> > and valid_mmap_phys_addr_range() in drivers/char/mem.c.
> > 
> > The default valid_phys_addr_range() allows any range lower
> > than __pa(high_memory), which is the end of system RAM, and
> > disallows any range higher than it.
> > 
> > Persistent memory may be located at lower and/or higher
> > address of __pa(high_memory) depending on their memory slots.
> > When using crash(8) via /dev/mem for analyzing data in
> > persistent memory, it can only access to the one lower than
> > __pa(high_memory).
> > 
> > Add x86 valid_phys_addr_range() and valid_mmap_phys_addr_range()
> > to provide better checking:
> >  - Physical address range is valid when it is fully backed by
> >    IORESOURCE_MEM, regardless of __pa(high_memory).
> >  - Other ranges, including holes, are invalid.
> > 
> > This also allows crash(8) to access persistent memory ranges
> > via /dev/mem (with a minor change to remove high_memory check
> > from crash itself).
> 
> If we're modifying crash(8) can't we also teach it to mmap /dev/pmemX
> directly?  With commit 90a545e98126 "restrict /dev/mem to idle io
> memory ranges" /dev/mem should not have access to active pmem ranges.

Yes, I am aware of the commit.  Unloading drivers while using crash(8) to
analyze NVDIMM via /dev/mem makes sense.  /dev/mem does not require any
other drivers be loaded.

Using /dev/pmemX, on the other hand, requires the driver to be loaded,
which can be problematic.  For instance, when btt_init() fails due to some
corruption in arena, it fails to create any pmem device file.  A dev file
also restricts access range within the dev file.

Thanks,
-Toshi

ps.
Looking at iomem_is_exclusive(), it only checks the top-level iomem
entries.  I think the pmem/btt driver only marks a child entry busy...
Kani, Toshi Feb. 17, 2016, 8:15 p.m. UTC | #6
On Wed, 2016-02-17 at 11:05 -0800, Dan Williams wrote:
> On Wed, Feb 17, 2016 at 11:35 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > On Wed, 2016-02-17 at 09:58 -0800, Dan Williams wrote:
> > > On Tue, Feb 9, 2016 at 6:06 PM, Toshi Kani <toshi.kani@hpe.com>
> > > wrote:
> > > > x86 does not define ARCH_HAS_VALID_PHYS_ADDR_RANGE, which
> > > > leads /dev/mem to use the default valid_phys_addr_range()
> > > > and valid_mmap_phys_addr_range() in drivers/char/mem.c.
> > > > 
> > > > The default valid_phys_addr_range() allows any range lower
> > > > than __pa(high_memory), which is the end of system RAM, and
> > > > disallows any range higher than it.
> > > > 
> > > > Persistent memory may be located at lower and/or higher
> > > > address of __pa(high_memory) depending on their memory slots.
> > > > When using crash(8) via /dev/mem for analyzing data in
> > > > persistent memory, it can only access to the one lower than
> > > > __pa(high_memory).
> > > > 
> > > > Add x86 valid_phys_addr_range() and valid_mmap_phys_addr_range()
> > > > to provide better checking:
> > > >  - Physical address range is valid when it is fully backed by
> > > >    IORESOURCE_MEM, regardless of __pa(high_memory).
> > > >  - Other ranges, including holes, are invalid.
> > > > 
> > > > This also allows crash(8) to access persistent memory ranges
> > > > via /dev/mem (with a minor change to remove high_memory check
> > > > from crash itself).
> > > 
> > > If we're modifying crash(8) can't we also teach it to mmap /dev/pmemX
> > > directly?  With commit 90a545e98126 "restrict /dev/mem to idle io
> > > memory ranges" /dev/mem should not have access to active pmem ranges.
> > 
> > Yes, I am aware of the commit.  Unloading drivers while using crash(8)
> > to analyze NVDIMM via /dev/mem makes sense.  /dev/mem does not require
> > any other drivers be loaded.
> 
> Ah, ok.  I thought this patch was bypassing that safety check.  If it
> requires the driver to be unloaded first then I'm fine with this.

Yes, crash(8) is used for analyzing static data (such as a crashdump file).
 It does not allow writing data unless user specifically enable it.

> > Using /dev/pmemX, on the other hand, requires the driver to be loaded,
> > which can be problematic.  For instance, when btt_init() fails due to
> > some corruption in arena, it fails to create any pmem device file.  A
> > dev file also restricts access range within the dev file.
> > 
> > Thanks,
> > -Toshi
> > 
> > ps.
> > Looking at iomem_is_exclusive(), it only checks the top-level iomem
> > entries.  I think the pmem/btt driver only marks a child entry busy...
> > 
> 
> It looks to me that next_resource(), via r_next(), walks child ranges.

Oh, sorry, I missed the r_next().

Thanks,
-Toshi
diff mbox

Patch

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index de25aad..189901a 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -36,6 +36,7 @@ 
 
 #define ARCH_HAS_IOREMAP_WC
 #define ARCH_HAS_IOREMAP_WT
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
 
 #include <linux/string.h>
 #include <linux/compiler.h>
@@ -326,6 +327,8 @@  extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
 extern void __iomem *ioremap_wt(resource_size_t offset, unsigned long size);
 
 extern bool is_early_ioremap_ptep(pte_t *ptep);
+extern int valid_phys_addr_range(phys_addr_t addr, size_t size);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t size);
 
 #ifdef CONFIG_XEN
 #include <xen/xen.h>
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 493f541..35cf96f 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -624,6 +624,30 @@  void __init init_mem_mapping(void)
 	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
 }
 
+/**
+ * valid_phys_addr_range - check phys addr for /dev/mem read and write
+ *
+ * Return true if a target physical address is marked as IORESOURCE_MEM.
+ */
+int valid_phys_addr_range(phys_addr_t addr, size_t size)
+{
+	return (region_intersects(addr, size, IORESOURCE_MEM,
+				  IORES_DESC_NONE) == REGION_INTERSECTS);
+}
+
+/**
+ * valid_mmap_phys_addr_range - check phys addr for /dev/mem mmap
+ *
+ * Return true if a target physical address is marked as IORESOURCE_MEM.
+ */
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)
+{
+	resource_size_t addr = pfn << PAGE_SHIFT;
+
+	return (region_intersects(addr, size, IORESOURCE_MEM,
+				  IORES_DESC_NONE) == REGION_INTERSECTS);
+}
+
 /*
  * devmem_is_allowed() checks to see if /dev/mem access to a certain address
  * is valid. The argument is a physical page number.