diff mbox

[RESEND] x86/mm: only allow memmap=XX!YY over existing RAM

Message ID 1467102671-593138-1-git-send-email-yigal@plexistor.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yigal Korman June 28, 2016, 8:31 a.m. UTC
Before this patch, passing a range that is beyond the physical memory
range will succeed, the user will see a /dev/pmem0 and will be able to
access it. Reads will always return 0 and writes will be silently
ignored.

I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml
failing that were eventually tracked down to be wrong values passed to
memmap.

This patch prevents the above issue by instead of adding a new memory
range, only update a RAM memory range with the PRAM type. This way,
passing the wrong memmap will either not give you a pmem at all or give
you a smaller one that actually has RAM behind it.

And if someone still needs to fake a pmem that doesn't have RAM behind
it, they can simply do memmap=XX@YY,XX!YY.

Signed-off-by: Yigal Korman <yigal@plexistor.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Boaz Harrosh <boaz@plexistor.com>
---
 arch/x86/kernel/e820.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dan Williams June 28, 2016, 4:33 p.m. UTC | #1
On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote:
> Before this patch, passing a range that is beyond the physical memory
> range will succeed, the user will see a /dev/pmem0 and will be able to
> access it. Reads will always return 0 and writes will be silently
> ignored.
>
> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml
> failing that were eventually tracked down to be wrong values passed to
> memmap.
>
> This patch prevents the above issue by instead of adding a new memory
> range, only update a RAM memory range with the PRAM type. This way,
> passing the wrong memmap will either not give you a pmem at all or give
> you a smaller one that actually has RAM behind it.
>
> And if someone still needs to fake a pmem that doesn't have RAM behind
> it, they can simply do memmap=XX@YY,XX!YY.
>
> Signed-off-by: Yigal Korman <yigal@plexistor.com>
> Acked-by: Dan Williams <dan.j.williams@intel.com>
> Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
> Tested-by: Boaz Harrosh <boaz@plexistor.com>
> ---

I have some other libnvdimm fixes heading upstream shortly if x86
folks just want to ack this one...
H. Peter Anvin June 28, 2016, 5:58 p.m. UTC | #2
On 06/28/16 09:33, Dan Williams wrote:
> On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote:
>> Before this patch, passing a range that is beyond the physical memory
>> range will succeed, the user will see a /dev/pmem0 and will be able to
>> access it. Reads will always return 0 and writes will be silently
>> ignored.
>>
>> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml
>> failing that were eventually tracked down to be wrong values passed to
>> memmap.
>>
>> This patch prevents the above issue by instead of adding a new memory
>> range, only update a RAM memory range with the PRAM type. This way,
>> passing the wrong memmap will either not give you a pmem at all or give
>> you a smaller one that actually has RAM behind it.
>>
>> And if someone still needs to fake a pmem that doesn't have RAM behind
>> it, they can simply do memmap=XX@YY,XX!YY.
>>
>> Signed-off-by: Yigal Korman <yigal@plexistor.com>
>> Acked-by: Dan Williams <dan.j.williams@intel.com>
>> Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
>> Tested-by: Boaz Harrosh <boaz@plexistor.com>
>> ---
> 
> I have some other libnvdimm fixes heading upstream shortly if x86
> folks just want to ack this one...
> 

I'm concerned about this.  This would mean that memory not marked as RAM
because it is persistent and you don't want the OS to accidentally
clobber persistent RAM can't be added.  So it seems like The Wrong
Thing.  If all you want is simulated pram then it shouldn't care about
addresses in the first place and instead we should just specify it by
quantity.

	-hpa
Dan Williams June 29, 2016, 1:09 a.m. UTC | #3
On Tue, Jun 28, 2016 at 10:58 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 06/28/16 09:33, Dan Williams wrote:
>> On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote:
>>> Before this patch, passing a range that is beyond the physical memory
>>> range will succeed, the user will see a /dev/pmem0 and will be able to
>>> access it. Reads will always return 0 and writes will be silently
>>> ignored.
>>>
>>> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml
>>> failing that were eventually tracked down to be wrong values passed to
>>> memmap.
>>>
>>> This patch prevents the above issue by instead of adding a new memory
>>> range, only update a RAM memory range with the PRAM type. This way,
>>> passing the wrong memmap will either not give you a pmem at all or give
>>> you a smaller one that actually has RAM behind it.
>>>
>>> And if someone still needs to fake a pmem that doesn't have RAM behind
>>> it, they can simply do memmap=XX@YY,XX!YY.
>>>
>>> Signed-off-by: Yigal Korman <yigal@plexistor.com>
>>> Acked-by: Dan Williams <dan.j.williams@intel.com>
>>> Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
>>> Tested-by: Boaz Harrosh <boaz@plexistor.com>
>>> ---
>>
>> I have some other libnvdimm fixes heading upstream shortly if x86
>> folks just want to ack this one...
>>
>
> I'm concerned about this.  This would mean that memory not marked as RAM
> because it is persistent and you don't want the OS to accidentally
> clobber persistent RAM can't be added.

Ah true.  Specifically you are worried about the case where a
platform-firmware has mis-marked pmem as reserved memory (or some
other type) and would like to correct it to be pram.

> So it seems like The Wrong
> Thing.  If all you want is simulated pram then it shouldn't care about
> addresses in the first place and instead we should just specify it by
> quantity.

Yes, agree we need an explicit "simulate pram" option independent of
memmap=, or just continue to educate users that if they try to
simulate pmem and specify an invalid range they get to keep all the
broken pieces.
Yigal Korman June 29, 2016, 5:12 a.m. UTC | #4
On Wed, Jun 29, 2016 at 4:09 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Tue, Jun 28, 2016 at 10:58 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> > On 06/28/16 09:33, Dan Williams wrote:
> >> On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote:
> >>> Before this patch, passing a range that is beyond the physical memory
> >>> range will succeed, the user will see a /dev/pmem0 and will be able to
> >>> access it. Reads will always return 0 and writes will be silently
> >>> ignored.
> >>>
> >>> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml
> >>> failing that were eventually tracked down to be wrong values passed to
> >>> memmap.
> >>>
> >>> This patch prevents the above issue by instead of adding a new memory
> >>> range, only update a RAM memory range with the PRAM type. This way,
> >>> passing the wrong memmap will either not give you a pmem at all or give
> >>> you a smaller one that actually has RAM behind it.
> >>>
> >>> And if someone still needs to fake a pmem that doesn't have RAM behind
> >>> it, they can simply do memmap=XX@YY,XX!YY.
> >>>
> >>> Signed-off-by: Yigal Korman <yigal@plexistor.com>
> >>> Acked-by: Dan Williams <dan.j.williams@intel.com>
> >>> Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
> >>> Tested-by: Boaz Harrosh <boaz@plexistor.com>
> >>> ---
> >>
> >> I have some other libnvdimm fixes heading upstream shortly if x86
> >> folks just want to ack this one...
> >>
> >
> > I'm concerned about this.  This would mean that memory not marked as RAM
> > because it is persistent and you don't want the OS to accidentally
> > clobber persistent RAM can't be added.
>
> Ah true.  Specifically you are worried about the case where a
> platform-firmware has mis-marked pmem as reserved memory (or some
> other type) and would like to correct it to be pram.


As I mentioned in the patch, this is still possible by doing memmap=X@Y,X!Y
Also, with fixes in grub and the kernel regarding mis-marking NVDIMMs
this is much less common today.
My purpose was simply to prevent a repeating user error for the common use case.

>
>
> > So it seems like The Wrong
> > Thing.  If all you want is simulated pram then it shouldn't care about
> > addresses in the first place and instead we should just specify it by
> > quantity.
>
> Yes, agree we need an explicit "simulate pram" option independent of
> memmap=, or just continue to educate users that if they try to
> simulate pmem and specify an invalid range they get to keep all the
> broken pieces.

I'd love to have a simpler way to specify simulated pram, but quantity
is not good enough.
For my use case, for example, I need the quantity to be spread evenly
over all NUMA nodes, so just getting a range "somewhere" is not good.
And I can imagine other users that want to pin pram to same socket
where their high speed NIC sits.
So I not sure we can find a better general api than memmap and I not
sure it's worth it.

Thanks,
Yigal
Dan Williams June 29, 2016, 2:41 p.m. UTC | #5
On Tue, Jun 28, 2016 at 10:12 PM, Yigal Korman <yigal@plexistor.com> wrote:
> On Wed, Jun 29, 2016 at 4:09 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> On Tue, Jun 28, 2016 at 10:58 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> > On 06/28/16 09:33, Dan Williams wrote:
>> >> On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote:
>> >>> Before this patch, passing a range that is beyond the physical memory
>> >>> range will succeed, the user will see a /dev/pmem0 and will be able to
>> >>> access it. Reads will always return 0 and writes will be silently
>> >>> ignored.
>> >>>
>> >>> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml
>> >>> failing that were eventually tracked down to be wrong values passed to
>> >>> memmap.
>> >>>
>> >>> This patch prevents the above issue by instead of adding a new memory
>> >>> range, only update a RAM memory range with the PRAM type. This way,
>> >>> passing the wrong memmap will either not give you a pmem at all or give
>> >>> you a smaller one that actually has RAM behind it.
>> >>>
>> >>> And if someone still needs to fake a pmem that doesn't have RAM behind
>> >>> it, they can simply do memmap=XX@YY,XX!YY.
>> >>>
>> >>> Signed-off-by: Yigal Korman <yigal@plexistor.com>
>> >>> Acked-by: Dan Williams <dan.j.williams@intel.com>
>> >>> Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
>> >>> Tested-by: Boaz Harrosh <boaz@plexistor.com>
>> >>> ---
>> >>
>> >> I have some other libnvdimm fixes heading upstream shortly if x86
>> >> folks just want to ack this one...
>> >>
>> >
>> > I'm concerned about this.  This would mean that memory not marked as RAM
>> > because it is persistent and you don't want the OS to accidentally
>> > clobber persistent RAM can't be added.
>>
>> Ah true.  Specifically you are worried about the case where a
>> platform-firmware has mis-marked pmem as reserved memory (or some
>> other type) and would like to correct it to be pram.
>
>
> As I mentioned in the patch, this is still possible by doing memmap=X@Y,X!Y

Sorry, yes I overlooked this in my response to Peter.

> Also, with fixes in grub and the kernel regarding mis-marking NVDIMMs
> this is much less common today.
> My purpose was simply to prevent a repeating user error for the common use case.

It makes sense, but at the same time it still involves a user
education burden that memmap=X!Y is constrained by default.

>>
>>
>> > So it seems like The Wrong
>> > Thing.  If all you want is simulated pram then it shouldn't care about
>> > addresses in the first place and instead we should just specify it by
>> > quantity.
>>
>> Yes, agree we need an explicit "simulate pram" option independent of
>> memmap=, or just continue to educate users that if they try to
>> simulate pmem and specify an invalid range they get to keep all the
>> broken pieces.
>
> I'd love to have a simpler way to specify simulated pram, but quantity
> is not good enough.
> For my use case, for example, I need the quantity to be spread evenly
> over all NUMA nodes, so just getting a range "somewhere" is not good.
> And I can imagine other users that want to pin pram to same socket
> where their high speed NIC sits.
> So I not sure we can find a better general api than memmap and I not
> sure it's worth it.

I think it would be worthwhile to have something like testpmem= which
takes the same parameters as memmap=, but it is constrained to RAM by
default.  Then it becomes clear that the user really does want a
simulation configuration on RAM rather than explicitly specifying a
new persistent memory range.
diff mbox

Patch

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 621b501..4bd4207 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -878,7 +878,7 @@  static int __init parse_memmap_one(char *p)
 		e820_add_region(start_at, mem_size, E820_RESERVED);
 	} else if (*p == '!') {
 		start_at = memparse(p+1, &p);
-		e820_add_region(start_at, mem_size, E820_PRAM);
+		e820_update_range(start_at, mem_size, E820_RAM, E820_PRAM);
 	} else
 		e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1);