diff mbox

[1/2] x86: add support for the non-standard protected e820 type

Message ID CAE9FiQXg0DZ3oCGmPk+qubwQ_=9LLMrZTJqN6HPn0t+5Vs8+Jg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yinghai Lu April 3, 2015, 5:12 p.m. UTC
On Fri, Apr 3, 2015 at 9:14 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Wed, 2015-04-01 at 09:12 +0200, Christoph Hellwig wrote:
>   :
>> @@ -748,7 +758,7 @@ u64 __init early_reserve_e820(u64 size, u64 align)
>>  /*
>>   * Find the highest page frame number we have available
>>   */
>> -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
>> +static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
>>  {
>>       int i;
>>       unsigned long last_pfn = 0;
>> @@ -759,7 +769,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
>>               unsigned long start_pfn;
>>               unsigned long end_pfn;
>>
>> -             if (ei->type != type)
>> +             /*
>> +              * Persistent memory is accounted as ram for purposes of
>> +              * establishing max_pfn and mem_map.
>> +              */
>> +             if (ei->type != E820_RAM && ei->type != E820_PRAM)
>>                       continue;
>
> Should we also delete this code, accounting E820_PRAM as ram, along with
> the deletion of reserve_pmem() in this version?

should revert those end_of_ram change as attached.

Comments

Toshi Kani April 3, 2015, 8:54 p.m. UTC | #1
On Fri, 2015-04-03 at 10:12 -0700, Yinghai Lu wrote:
> On Fri, Apr 3, 2015 at 9:14 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Wed, 2015-04-01 at 09:12 +0200, Christoph Hellwig wrote:
> >   :
> >> @@ -748,7 +758,7 @@ u64 __init early_reserve_e820(u64 size, u64 align)
> >>  /*
> >>   * Find the highest page frame number we have available
> >>   */
> >> -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
> >> +static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
> >>  {
> >>       int i;
> >>       unsigned long last_pfn = 0;
> >> @@ -759,7 +769,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
> >>               unsigned long start_pfn;
> >>               unsigned long end_pfn;
> >>
> >> -             if (ei->type != type)
> >> +             /*
> >> +              * Persistent memory is accounted as ram for purposes of
> >> +              * establishing max_pfn and mem_map.
> >> +              */
> >> +             if (ei->type != E820_RAM && ei->type != E820_PRAM)
> >>                       continue;
> >
> > Should we also delete this code, accounting E820_PRAM as ram, along with
> > the deletion of reserve_pmem() in this version?
> 
> should revert those end_of_ram change as attached.

I confirmed that the pmem driver works with the change, and last_pfn is
updated as expected.

Thanks,
-Toshi
Ingo Molnar April 4, 2015, 9:40 a.m. UTC | #2
* Toshi Kani <toshi.kani@hp.com> wrote:

> On Fri, 2015-04-03 at 10:12 -0700, Yinghai Lu wrote:
> > On Fri, Apr 3, 2015 at 9:14 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> > > On Wed, 2015-04-01 at 09:12 +0200, Christoph Hellwig wrote:
> > >   :
> > >> @@ -748,7 +758,7 @@ u64 __init early_reserve_e820(u64 size, u64 align)
> > >>  /*
> > >>   * Find the highest page frame number we have available
> > >>   */
> > >> -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
> > >> +static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
> > >>  {
> > >>       int i;
> > >>       unsigned long last_pfn = 0;
> > >> @@ -759,7 +769,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
> > >>               unsigned long start_pfn;
> > >>               unsigned long end_pfn;
> > >>
> > >> -             if (ei->type != type)
> > >> +             /*
> > >> +              * Persistent memory is accounted as ram for purposes of
> > >> +              * establishing max_pfn and mem_map.
> > >> +              */
> > >> +             if (ei->type != E820_RAM && ei->type != E820_PRAM)
> > >>                       continue;
> > >
> > > Should we also delete this code, accounting E820_PRAM as ram, along with
> > > the deletion of reserve_pmem() in this version?
> > 
> > should revert those end_of_ram change as attached.
> 
> I confirmed that the pmem driver works with the change, and last_pfn is
> updated as expected.

Could someone please send the fix with a changelog, etc?

Thanks,

	Ingo
Yinghai Lu April 5, 2015, 7:44 a.m. UTC | #3
On Sat, Apr 4, 2015 at 2:40 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Toshi Kani <toshi.kani@hp.com> wrote:
>
>> On Fri, 2015-04-03 at 10:12 -0700, Yinghai Lu wrote:
>> > On Fri, Apr 3, 2015 at 9:14 AM, Toshi Kani <toshi.kani@hp.com> wrote:
>> > > On Wed, 2015-04-01 at 09:12 +0200, Christoph Hellwig wrote:
>> >
>> > should revert those end_of_ram change as attached.
>>
>> I confirmed that the pmem driver works with the change, and last_pfn is
>> updated as expected.
>
> Could someone please send the fix with a changelog, etc?
>

Why just fold those change into that commit.
Or you can just drop the patch and ask Christoph to resubmit updated
patch again.

I asked Christoph to remove reserved_pmem, and he agreed to do that

http://lkml.iu.edu/hypermail/linux/kernel/1503.3/02919.html
http://lkml.iu.edu/hypermail/linux/kernel/1503.3/03119.html

but sadly, he did not put me on the CC list for while sending updated patch.
and next day you picked it up to tip/pmem branch.
otherwise we could find the problem early

and he even did not put my name on the changelog :-(
with that, I could find the email early too..

Yinghai
Boaz Harrosh April 5, 2015, 9:18 a.m. UTC | #4
On 04/03/2015 08:12 PM, Yinghai Lu wrote:
> On Fri, Apr 3, 2015 at 9:14 AM, Toshi Kani <toshi.kani@hp.com> wrote:
>> On Wed, 2015-04-01 at 09:12 +0200, Christoph Hellwig wrote:
>>   :
>>> @@ -748,7 +758,7 @@ u64 __init early_reserve_e820(u64 size, u64 align)
>>>  /*
>>>   * Find the highest page frame number we have available
>>>   */
>>> -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
>>> +static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
>>>  {
>>>       int i;
>>>       unsigned long last_pfn = 0;
>>> @@ -759,7 +769,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
>>>               unsigned long start_pfn;
>>>               unsigned long end_pfn;
>>>
>>> -             if (ei->type != type)
>>> +             /*
>>> +              * Persistent memory is accounted as ram for purposes of
>>> +              * establishing max_pfn and mem_map.
>>> +              */
>>> +             if (ei->type != E820_RAM && ei->type != E820_PRAM)
>>>                       continue;
>>
>> Should we also delete this code, accounting E820_PRAM as ram, along with
>> the deletion of reserve_pmem() in this version?
> 

Hi Yinghai, Toshi

In my old patches I did not have these updates as well, and everything
was very much usable, for a long time.

However. I actually liked these changes in Christoph's patches and
thought they should stay, here is why.

Today I will be sending patches to make pmem be supported with
page-struct as an optional alternative to the use of ioremap.
This is for advanced users that wants to RDMA direct_IO and so
on directly out of pmem.
At one point we had a BUG in some mm/memory.c code that was checking max_pfn.
Actually that was a bug and we do not go through this code anymore. And between
us that global variable max_pfn is a bad hack. But I kind of like to have it as
long as it is used. So code that wants to protect by max_pfn can still accept
pmem memory submitted to it.

I have tried to audit the Kernel use of max_pfn and I do not see how
this can hurt? I do see were it would theoretically help.

Think of a system that looks like this as a memory map:
1. VM (Volitile mem)
2. PM
3. VM
4. PM

Which is what is returned by current and planned NUMA implementations.
So pmem region-2 will be covered by max_pfn. But pmem region 4 will not.
If any code checks for max_pfn it will be OK with pmem-2 but *not* with
pmem-4. This is highly unexpected.

I think the all max_pfn should be killed ASAP, but until it is then
it will not hurt for pmem to be covered.

Thanks
Boaz
Yinghai Lu April 5, 2015, 8:06 p.m. UTC | #5
On Sun, Apr 5, 2015 at 2:18 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
> On 04/03/2015 08:12 PM, Yinghai Lu wrote:
>> On Fri, Apr 3, 2015 at 9:14 AM, Toshi Kani <toshi.kani@hp.com> wrote:
>>> On Wed, 2015-04-01 at 09:12 +0200, Christoph Hellwig wrote:
>>>   :
>>>> @@ -748,7 +758,7 @@ u64 __init early_reserve_e820(u64 size, u64 align)
>>>>  /*
>>>>   * Find the highest page frame number we have available
>>>>   */
>>>> -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
>>>> +static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
>>>>  {
>>>>       int i;
>>>>       unsigned long last_pfn = 0;
>>>> @@ -759,7 +769,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
>>>>               unsigned long start_pfn;
>>>>               unsigned long end_pfn;
>>>>
>>>> -             if (ei->type != type)
>>>> +             /*
>>>> +              * Persistent memory is accounted as ram for purposes of
>>>> +              * establishing max_pfn and mem_map.
>>>> +              */
>>>> +             if (ei->type != E820_RAM && ei->type != E820_PRAM)
>>>>                       continue;
>>>
>>> Should we also delete this code, accounting E820_PRAM as ram, along with
>>> the deletion of reserve_pmem() in this version?
>>
>
> Hi Yinghai, Toshi
>
> In my old patches I did not have these updates as well, and everything
> was very much usable, for a long time.
>
> However. I actually liked these changes in Christoph's patches and
> thought they should stay, here is why.
>
> Today I will be sending patches to make pmem be supported with
> page-struct as an optional alternative to the use of ioremap.
> This is for advanced users that wants to RDMA direct_IO and so
> on directly out of pmem.

but it is not related.  Should just remove those lines.

And even his original changes about memblock is not needed.

| You did not modify memblock_x86_fill() to treat
| E820_PRAM as E820_RAM, so memblock will not have any
| entry for E820_PRAM, so you do not need to call memblock_reserve
| there.
|
| And the same time, init_memory_mapping() will call
| init_range_memory_mapping/for_each_mem_pfn_range() to
| set kernel mapping for memory range in memblock only.
| So here calling init_memory_mapping will not do anything.
| then just drop calling to that init_memory_mapping.
| --- so will not kernel mapping pmem, is that what you intended to have?
|
| After those two changes, You do not need this reserve_pmem at all.
| Just drop it.
Boaz Harrosh April 6, 2015, 7:16 a.m. UTC | #6
On 04/05/2015 11:06 PM, Yinghai Lu wrote:
> On Sun, Apr 5, 2015 at 2:18 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
<>
>> Hi Yinghai, Toshi
>>
>> In my old patches I did not have these updates as well, and everything
>> was very much usable, for a long time.
>>
>> However. I actually liked these changes in Christoph's patches and
>> thought they should stay, here is why.
>>
>> Today I will be sending patches to make pmem be supported with
>> page-struct as an optional alternative to the use of ioremap.
>> This is for advanced users that wants to RDMA direct_IO and so
>> on directly out of pmem.
> 
> but it is not related.  Should just remove those lines.
> 
> And even his original changes about memblock is not needed.
> 

Never mind. Has I said it hit us once in the passed but do to
a bug. I do not mind you can remove the max_pfn thing I can do
without it.

Thanks
Boaz

> | You did not modify memblock_x86_fill() to treat
> | E820_PRAM as E820_RAM, so memblock will not have any
> | entry for E820_PRAM, so you do not need to call memblock_reserve
> | there.
> |
> | And the same time, init_memory_mapping() will call
> | init_range_memory_mapping/for_each_mem_pfn_range() to
> | set kernel mapping for memory range in memblock only.
> | So here calling init_memory_mapping will not do anything.
> | then just drop calling to that init_memory_mapping.
> | --- so will not kernel mapping pmem, is that what you intended to have?
> |
> | After those two changes, You do not need this reserve_pmem at all.
> | Just drop it.
>
Ingo Molnar April 6, 2015, 7:27 a.m. UTC | #7
* Yinghai Lu <yinghai@kernel.org> wrote:

> On Sat, Apr 4, 2015 at 2:40 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Toshi Kani <toshi.kani@hp.com> wrote:
> >
> >> On Fri, 2015-04-03 at 10:12 -0700, Yinghai Lu wrote:
> >> > On Fri, Apr 3, 2015 at 9:14 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> >> > > On Wed, 2015-04-01 at 09:12 +0200, Christoph Hellwig wrote:
> >> >
> >> > should revert those end_of_ram change as attached.
> >>
> >> I confirmed that the pmem driver works with the change, and last_pfn is
> >> updated as expected.
> >
> > Could someone please send the fix with a changelog, etc?
> >
> 
> Why just fold those change into that commit.

Because we try to avoid doing rebases of otherwise tested trees, and 
because fixes will outline the thinking behind the code as well.

Thanks,

	Ingo
Christoph Hellwig April 6, 2015, 3:55 p.m. UTC | #8
On Fri, Apr 03, 2015 at 10:12:39AM -0700, Yinghai Lu wrote:
> > Should we also delete this code, accounting E820_PRAM as ram, along with
> > the deletion of reserve_pmem() in this version?
> 
> should revert those end_of_ram change as attached.

This works fine for me:

Tested-by: Christoph Hellwig <hch@lst.de>


> -static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
> +static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)

But I'd prefer not to re-add the argument here, it only obsfucates the
code.
Toshi Kani April 6, 2015, 5:29 p.m. UTC | #9
On Sat, 2015-04-04 at 11:40 +0200, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > On Fri, 2015-04-03 at 10:12 -0700, Yinghai Lu wrote:
> > > On Fri, Apr 3, 2015 at 9:14 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> > > > On Wed, 2015-04-01 at 09:12 +0200, Christoph Hellwig wrote:
> > > >   :
> > > >> @@ -748,7 +758,7 @@ u64 __init early_reserve_e820(u64 size, u64 align)
> > > >>  /*
> > > >>   * Find the highest page frame number we have available
> > > >>   */
> > > >> -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
> > > >> +static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
> > > >>  {
> > > >>       int i;
> > > >>       unsigned long last_pfn = 0;
> > > >> @@ -759,7 +769,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
> > > >>               unsigned long start_pfn;
> > > >>               unsigned long end_pfn;
> > > >>
> > > >> -             if (ei->type != type)
> > > >> +             /*
> > > >> +              * Persistent memory is accounted as ram for purposes of
> > > >> +              * establishing max_pfn and mem_map.
> > > >> +              */
> > > >> +             if (ei->type != E820_RAM && ei->type != E820_PRAM)
> > > >>                       continue;
> > > >
> > > > Should we also delete this code, accounting E820_PRAM as ram, along with
> > > > the deletion of reserve_pmem() in this version?
> > > 
> > > should revert those end_of_ram change as attached.
> > 
> > I confirmed that the pmem driver works with the change, and last_pfn is
> > updated as expected.
> 
> Could someone please send the fix with a changelog, etc?

OK, I will send a patch for the fix, with suggested update from
Christoph of not to re-add 'type' argument to e820_end_pfn().

Yinghai, I will add your sign-off to the patch.  Let me know if you have
any concern.

Thanks,
-Toshi
Toshi Kani April 6, 2015, 6:23 p.m. UTC | #10
On Mon, 2015-04-06 at 11:26 -0700, Yinghai Lu wrote:
> On Mon, Apr 6, 2015 at 10:29 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Sat, 2015-04-04 at 11:40 +0200, Ingo Molnar wrote:
> 
> > OK, I will send a patch for the fix, with suggested update from
> > Christoph of not to re-add 'type' argument to e820_end_pfn().
> >
> > Yinghai, I will add your sign-off to the patch.  Let me know if you have
> > any concern.
> 
> I think we should restore all to original.
> 
> e820_end_pfn(unsigned long limit_pfn, unsigned type)
> e820_end_of_ram_pfn
> e820_end_of_low_ram_pfn
> 
> otherwise it will cause confusing, because last two really have ram_pfn,
> and the first one does not has ram in the function name.

Good point.  I will revert back to the original.  

Thanks,
-Toshi
Yinghai Lu April 6, 2015, 6:26 p.m. UTC | #11
On Mon, Apr 6, 2015 at 10:29 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Sat, 2015-04-04 at 11:40 +0200, Ingo Molnar wrote:

> OK, I will send a patch for the fix, with suggested update from
> Christoph of not to re-add 'type' argument to e820_end_pfn().
>
> Yinghai, I will add your sign-off to the patch.  Let me know if you have
> any concern.

I think we should restore all to original.

e820_end_pfn(unsigned long limit_pfn, unsigned type)
e820_end_of_ram_pfn
e820_end_of_low_ram_pfn

otherwise it will cause confusing, because last two really have ram_pfn,
and the first one does not has ram in the function name.

Thanks

Yinghai
diff mbox

Patch

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index e2ce85d..e09a346 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -752,7 +752,7 @@  u64 __init early_reserve_e820(u64 size, u64 align)
 /*
  * Find the highest page frame number we have available
  */
-static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
+static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
 {
 	int i;
 	unsigned long last_pfn = 0;
@@ -763,11 +763,7 @@  static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
 		unsigned long start_pfn;
 		unsigned long end_pfn;
 
-		/*
-		 * Persistent memory is accounted as ram for purposes of
-		 * establishing max_pfn and mem_map.
-		 */
-		if (ei->type != E820_RAM && ei->type != E820_PRAM)
+		if (ei->type != type)
 			continue;
 
 		start_pfn = ei->addr >> PAGE_SHIFT;
@@ -792,12 +788,12 @@  static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
 }
 unsigned long __init e820_end_of_ram_pfn(void)
 {
-	return e820_end_pfn(MAX_ARCH_PFN);
+	return e820_end_pfn(MAX_ARCH_PFN, E820_RAM);
 }
 
 unsigned long __init e820_end_of_low_ram_pfn(void)
 {
-	return e820_end_pfn(1UL << (32-PAGE_SHIFT));
+	return e820_end_pfn(1UL<<(32 - PAGE_SHIFT), E820_RAM);
 }
 
 static void early_panic(char *msg)