[3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
diff mbox series

Message ID 20200117232254.2792-4-richardw.yang@linux.intel.com
State New
Headers show
Series
  • mm/mremap.c: cleanup move_page_tables() a little
Related show

Commit Message

Wei Yang Jan. 17, 2020, 11:22 p.m. UTC
Use the general helper instead of do it by hand.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 mm/mremap.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

Dmitry Osipenko Jan. 26, 2020, 2:47 p.m. UTC | #1
18.01.2020 02:22, Wei Yang пишет:
> Use the general helper instead of do it by hand.
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> ---
>  mm/mremap.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/mremap.c b/mm/mremap.c
> index c2af8ba4ba43..a258914f3ee1 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  
>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>  		cond_resched();
> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> -		/* even if next overflowed, extent below will be ok */
> +		next = pmd_addr_end(old_addr, old_end);
>  		extent = next - old_addr;
> -		if (extent > old_end - old_addr)
> -			extent = old_end - old_addr;
>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>  		if (!old_pmd)
>  			continue;
> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  
>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>  			break;
> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> +		next = pmd_addr_end(new_addr, new_addr + len);
>  		if (extent > next - new_addr)
>  			extent = next - new_addr;
>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> 

Hello Wei,

Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
Tegra (ARM32):

  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190

and eventually kernel hangs.

Git's bisection points to this patch and reverting it helps. Please fix,
thanks in advance.
Andrew Morton Jan. 27, 2020, 2:59 a.m. UTC | #2
On Sun, 26 Jan 2020 17:47:57 +0300 Dmitry Osipenko <digetx@gmail.com> wrote:

> 18.01.2020 02:22, Wei Yang пишет:
> > Use the general helper instead of do it by hand.
> > 
> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> > ---
> >  mm/mremap.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index c2af8ba4ba43..a258914f3ee1 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >  
> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >  		cond_resched();
> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> > -		/* even if next overflowed, extent below will be ok */
> > +		next = pmd_addr_end(old_addr, old_end);
> >  		extent = next - old_addr;
> > -		if (extent > old_end - old_addr)
> > -			extent = old_end - old_addr;
> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >  		if (!old_pmd)
> >  			continue;
> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >  
> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
> >  			break;
> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> > +		next = pmd_addr_end(new_addr, new_addr + len);
> >  		if (extent > next - new_addr)
> >  			extent = next - new_addr;
> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> > 
> 
> Hello Wei,
> 
> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> Tegra (ARM32):
> 
>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> 
> and eventually kernel hangs.
> 
> Git's bisection points to this patch and reverting it helps. Please fix,
> thanks in advance.

Thanks.  I had these tagged for 5.7-rc1 anyway, so I'll drop all five
patches.
Wei Yang Jan. 28, 2020, 12:43 a.m. UTC | #3
On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>18.01.2020 02:22, Wei Yang пишет:
>> Use the general helper instead of do it by hand.
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> ---
>>  mm/mremap.c | 7 ++-----
>>  1 file changed, 2 insertions(+), 5 deletions(-)
>> 
>> diff --git a/mm/mremap.c b/mm/mremap.c
>> index c2af8ba4ba43..a258914f3ee1 100644
>> --- a/mm/mremap.c
>> +++ b/mm/mremap.c
>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>  
>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>  		cond_resched();
>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>> -		/* even if next overflowed, extent below will be ok */
>> +		next = pmd_addr_end(old_addr, old_end);
>>  		extent = next - old_addr;
>> -		if (extent > old_end - old_addr)
>> -			extent = old_end - old_addr;
>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>  		if (!old_pmd)
>>  			continue;
>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>  
>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>  			break;
>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>  		if (extent > next - new_addr)
>>  			extent = next - new_addr;
>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> 
>
>Hello Wei,
>
>Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>Tegra (ARM32):
>
>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>

Thanks.

Would you mind letting me know which case you are testing? Or the special
thing is 32-bit platform?

>and eventually kernel hangs.
>
>Git's bisection points to this patch and reverting it helps. Please fix,
>thanks in advance.
Dmitry Osipenko Jan. 28, 2020, 3:59 p.m. UTC | #4
28.01.2020 03:43, Wei Yang пишет:
> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> 18.01.2020 02:22, Wei Yang пишет:
>>> Use the general helper instead of do it by hand.
>>>
>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>> ---
>>>  mm/mremap.c | 7 ++-----
>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>> index c2af8ba4ba43..a258914f3ee1 100644
>>> --- a/mm/mremap.c
>>> +++ b/mm/mremap.c
>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>  
>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>  		cond_resched();
>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>> -		/* even if next overflowed, extent below will be ok */
>>> +		next = pmd_addr_end(old_addr, old_end);
>>>  		extent = next - old_addr;
>>> -		if (extent > old_end - old_addr)
>>> -			extent = old_end - old_addr;
>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>  		if (!old_pmd)
>>>  			continue;
>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>  
>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>  			break;
>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>  		if (extent > next - new_addr)
>>>  			extent = next - new_addr;
>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>
>>
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
> 
> Thanks.
> 
> Would you mind letting me know which case you are testing?

Nothing special, systemd starts to fall apart during boot.

> Or the special thing is 32-bit platform?
I have a limited knowledge about mm/, so can't provide detailed explanation.

Please take a look at this:

[1]
https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210

[2]
https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549

[3]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b
Wei Yang Jan. 28, 2020, 11:29 p.m. UTC | #5
On Tue, Jan 28, 2020 at 06:59:48PM +0300, Dmitry Osipenko wrote:
>28.01.2020 03:43, Wei Yang пишет:
>> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>>> 18.01.2020 02:22, Wei Yang пишет:
>>>> Use the general helper instead of do it by hand.
>>>>
>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>> ---
>>>>  mm/mremap.c | 7 ++-----
>>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>> index c2af8ba4ba43..a258914f3ee1 100644
>>>> --- a/mm/mremap.c
>>>> +++ b/mm/mremap.c
>>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>  
>>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>>  		cond_resched();
>>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>>> -		/* even if next overflowed, extent below will be ok */
>>>> +		next = pmd_addr_end(old_addr, old_end);
>>>>  		extent = next - old_addr;
>>>> -		if (extent > old_end - old_addr)
>>>> -			extent = old_end - old_addr;
>>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>>  		if (!old_pmd)
>>>>  			continue;
>>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>  
>>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>>  			break;
>>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>>  		if (extent > next - new_addr)
>>>>  			extent = next - new_addr;
>>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>>
>>>
>>> Hello Wei,
>>>
>>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>>> Tegra (ARM32):
>>>
>>>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>>
>> 
>> Thanks.
>> 
>> Would you mind letting me know which case you are testing?
>
>Nothing special, systemd starts to fall apart during boot.
>
>> Or the special thing is 32-bit platform?
>I have a limited knowledge about mm/, so can't provide detailed explanation.
>
>Please take a look at this:
>
>[1]
>https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210
>
>[2]
>https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549
>
>[3]
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b

Thanks, I see the difference here.

If this is the case, we can't use pmd_addr_end() to simplify the calculation.
This changes the behavior.

I would prepare another patch set to fix this. Would you mind helping me
verify on your platform?
Dmitry Osipenko Jan. 28, 2020, 11:35 p.m. UTC | #6
29.01.2020 02:29, Wei Yang пишет:
> On Tue, Jan 28, 2020 at 06:59:48PM +0300, Dmitry Osipenko wrote:
>> 28.01.2020 03:43, Wei Yang пишет:
>>> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>>>> 18.01.2020 02:22, Wei Yang пишет:
>>>>> Use the general helper instead of do it by hand.
>>>>>
>>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>>> ---
>>>>>  mm/mremap.c | 7 ++-----
>>>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>> index c2af8ba4ba43..a258914f3ee1 100644
>>>>> --- a/mm/mremap.c
>>>>> +++ b/mm/mremap.c
>>>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>  
>>>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>>>  		cond_resched();
>>>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>>>> -		/* even if next overflowed, extent below will be ok */
>>>>> +		next = pmd_addr_end(old_addr, old_end);
>>>>>  		extent = next - old_addr;
>>>>> -		if (extent > old_end - old_addr)
>>>>> -			extent = old_end - old_addr;
>>>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>>>  		if (!old_pmd)
>>>>>  			continue;
>>>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>  
>>>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>>>  			break;
>>>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>>>  		if (extent > next - new_addr)
>>>>>  			extent = next - new_addr;
>>>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>>>
>>>>
>>>> Hello Wei,
>>>>
>>>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>>>> Tegra (ARM32):
>>>>
>>>>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>>>
>>>
>>> Thanks.
>>>
>>> Would you mind letting me know which case you are testing?
>>
>> Nothing special, systemd starts to fall apart during boot.
>>
>>> Or the special thing is 32-bit platform?
>> I have a limited knowledge about mm/, so can't provide detailed explanation.
>>
>> Please take a look at this:
>>
>> [1]
>> https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210
>>
>> [2]
>> https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549
>>
>> [3]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b
> 
> Thanks, I see the difference here.
> 
> If this is the case, we can't use pmd_addr_end() to simplify the calculation.
> This changes the behavior.
> 
> I would prepare another patch set to fix this. Would you mind helping me
> verify on your platform?

Sure, please feel free to CC me on that patch.
Wei Yang Jan. 29, 2020, 12:28 a.m. UTC | #7
On Wed, Jan 29, 2020 at 02:35:25AM +0300, Dmitry Osipenko wrote:
>29.01.2020 02:29, Wei Yang пишет:
>> On Tue, Jan 28, 2020 at 06:59:48PM +0300, Dmitry Osipenko wrote:
>>> 28.01.2020 03:43, Wei Yang пишет:
>>>> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>>>>> 18.01.2020 02:22, Wei Yang пишет:
>>>>>> Use the general helper instead of do it by hand.
>>>>>>
>>>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>>>> ---
>>>>>>  mm/mremap.c | 7 ++-----
>>>>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index c2af8ba4ba43..a258914f3ee1 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>>  
>>>>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>>>>  		cond_resched();
>>>>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>>>>> -		/* even if next overflowed, extent below will be ok */
>>>>>> +		next = pmd_addr_end(old_addr, old_end);
>>>>>>  		extent = next - old_addr;
>>>>>> -		if (extent > old_end - old_addr)
>>>>>> -			extent = old_end - old_addr;
>>>>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>>>>  		if (!old_pmd)
>>>>>>  			continue;
>>>>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>>  
>>>>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>>>>  			break;
>>>>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>>>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>>>>  		if (extent > next - new_addr)
>>>>>>  			extent = next - new_addr;
>>>>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>>>>
>>>>>
>>>>> Hello Wei,
>>>>>
>>>>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>>>>> Tegra (ARM32):
>>>>>
>>>>>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>> Would you mind letting me know which case you are testing?
>>>
>>> Nothing special, systemd starts to fall apart during boot.
>>>
>>>> Or the special thing is 32-bit platform?
>>> I have a limited knowledge about mm/, so can't provide detailed explanation.
>>>
>>> Please take a look at this:
>>>
>>> [1]
>>> https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210
>>>
>>> [2]
>>> https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549
>>>
>>> [3]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b
>> 
>> Thanks, I see the difference here.
>> 
>> If this is the case, we can't use pmd_addr_end() to simplify the calculation.
>> This changes the behavior.
>> 
>> I would prepare another patch set to fix this. Would you mind helping me
>> verify on your platform?
>
>Sure, please feel free to CC me on that patch.

Thanks, you are in the cc list of v2.

Hope this one works fine on ARM.
Russell King - ARM Linux admin Jan. 29, 2020, 9:47 a.m. UTC | #8
On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
> 18.01.2020 02:22, Wei Yang пишет:
> > Use the general helper instead of do it by hand.
> > 
> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> > ---
> >  mm/mremap.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index c2af8ba4ba43..a258914f3ee1 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >  
> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >  		cond_resched();
> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> > -		/* even if next overflowed, extent below will be ok */
> > +		next = pmd_addr_end(old_addr, old_end);
> >  		extent = next - old_addr;
> > -		if (extent > old_end - old_addr)
> > -			extent = old_end - old_addr;
> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >  		if (!old_pmd)
> >  			continue;
> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >  
> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
> >  			break;
> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> > +		next = pmd_addr_end(new_addr, new_addr + len);
> >  		if (extent > next - new_addr)
> >  			extent = next - new_addr;
> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> > 
> 
> Hello Wei,
> 
> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> Tegra (ARM32):
> 
>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> 
> and eventually kernel hangs.
> 
> Git's bisection points to this patch and reverting it helps. Please fix,
> thanks in advance.

The above is definitely wrong - pXX_addr_end() are designed to be used
with an address index within the pXX table table and the address index
of either the last entry in the same pXX table or the beginning of the
_next_ pXX table.  Arbitary end address indicies are not allowed.

When page tables are "rolled up" when levels don't exist, it is common
practice for these macros to just return their end address index.
Hence, if they are used with arbitary end address indicies, then the
iteration will fail.

The only way to do this is:

	next = pmd_addr_end(old_addr,
			pud_addr_end(old_addr,
				p4d_addr_end(old_addr,
					pgd_addr_end(old_addr, old_end))));

which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
the correct end argument.  However, that's a more complex and verbose,
and likely less efficient than the current code.

I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
and trying to "clean it up" will just result in less efficient or
broken code.
Dmitry Osipenko Jan. 29, 2020, 2:21 p.m. UTC | #9
29.01.2020 12:47, Russell King - ARM Linux admin пишет:
> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> 18.01.2020 02:22, Wei Yang пишет:
>>> Use the general helper instead of do it by hand.
>>>
>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>> ---
>>>  mm/mremap.c | 7 ++-----
>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>> index c2af8ba4ba43..a258914f3ee1 100644
>>> --- a/mm/mremap.c
>>> +++ b/mm/mremap.c
>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>  
>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>  		cond_resched();
>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>> -		/* even if next overflowed, extent below will be ok */
>>> +		next = pmd_addr_end(old_addr, old_end);
>>>  		extent = next - old_addr;
>>> -		if (extent > old_end - old_addr)
>>> -			extent = old_end - old_addr;
>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>  		if (!old_pmd)
>>>  			continue;
>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>  
>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>  			break;
>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>  		if (extent > next - new_addr)
>>>  			extent = next - new_addr;
>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>
>>
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
>> and eventually kernel hangs.
>>
>> Git's bisection points to this patch and reverting it helps. Please fix,
>> thanks in advance.
> 
> The above is definitely wrong - pXX_addr_end() are designed to be used
> with an address index within the pXX table table and the address index
> of either the last entry in the same pXX table or the beginning of the
> _next_ pXX table.  Arbitary end address indicies are not allowed.
> 
> When page tables are "rolled up" when levels don't exist, it is common
> practice for these macros to just return their end address index.
> Hence, if they are used with arbitary end address indicies, then the
> iteration will fail.
> 
> The only way to do this is:
> 
> 	next = pmd_addr_end(old_addr,
> 			pud_addr_end(old_addr,
> 				p4d_addr_end(old_addr,
> 					pgd_addr_end(old_addr, old_end))));
> 
> which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
> the correct end argument.  However, that's a more complex and verbose,
> and likely less efficient than the current code.
> 
> I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
> and trying to "clean it up" will just result in less efficient or
> broken code.
> 

Hello Russell,

Thank you very much for the extra clarification!
Dmitry Osipenko Jan. 29, 2020, 5:18 p.m. UTC | #10
27.01.2020 05:59, Andrew Morton пишет:
> On Sun, 26 Jan 2020 17:47:57 +0300 Dmitry Osipenko <digetx@gmail.com> wrote:
...
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
>> and eventually kernel hangs.
>>
>> Git's bisection points to this patch and reverting it helps. Please fix,
>> thanks in advance.
> 
> Thanks.  I had these tagged for 5.7-rc1 anyway, so I'll drop all five
> patches.
> 

Hello Andrew,

FYI, I'm still seeing the offending patches in the today's next-20200129.
Dmitry Osipenko Jan. 29, 2020, 6:56 p.m. UTC | #11
29.01.2020 03:28, Wei Yang пишет:
...
>>> I would prepare another patch set to fix this. Would you mind helping me
>>> verify on your platform?
>>
>> Sure, please feel free to CC me on that patch.
> 
> Thanks, you are in the cc list of v2.
> 
> Hope this one works fine on ARM.

Okay, I'll reply to the v2 after some more extensive testing (tomorrow).
Wei Yang Jan. 29, 2020, 9:57 p.m. UTC | #12
On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
>On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> 18.01.2020 02:22, Wei Yang пишет:
>> > Use the general helper instead of do it by hand.
>> > 
>> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> > ---
>> >  mm/mremap.c | 7 ++-----
>> >  1 file changed, 2 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/mm/mremap.c b/mm/mremap.c
>> > index c2af8ba4ba43..a258914f3ee1 100644
>> > --- a/mm/mremap.c
>> > +++ b/mm/mremap.c
>> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >  
>> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> >  		cond_resched();
>> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>> > -		/* even if next overflowed, extent below will be ok */
>> > +		next = pmd_addr_end(old_addr, old_end);
>> >  		extent = next - old_addr;
>> > -		if (extent > old_end - old_addr)
>> > -			extent = old_end - old_addr;
>> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> >  		if (!old_pmd)
>> >  			continue;
>> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >  
>> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>> >  			break;
>> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>> > +		next = pmd_addr_end(new_addr, new_addr + len);
>> >  		if (extent > next - new_addr)
>> >  			extent = next - new_addr;
>> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> > 
>> 
>> Hello Wei,
>> 
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>> 
>>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>> 
>> and eventually kernel hangs.
>> 
>> Git's bisection points to this patch and reverting it helps. Please fix,
>> thanks in advance.
>
>The above is definitely wrong - pXX_addr_end() are designed to be used
>with an address index within the pXX table table and the address index
>of either the last entry in the same pXX table or the beginning of the
>_next_ pXX table.  Arbitary end address indicies are not allowed.
>

#define pmd_addr_end(addr, end)						\
({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
})

If my understanding is correct, the definition here align the addr to next PMD
boundary or end.

I don't see the possibility to across another PMD. Do I miss something?

>When page tables are "rolled up" when levels don't exist, it is common
>practice for these macros to just return their end address index.
>Hence, if they are used with arbitary end address indicies, then the
>iteration will fail.
>
>The only way to do this is:
>
>	next = pmd_addr_end(old_addr,
>			pud_addr_end(old_addr,
>				p4d_addr_end(old_addr,
>					pgd_addr_end(old_addr, old_end))));
>
>which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
>the correct end argument.  However, that's a more complex and verbose,
>and likely less efficient than the current code.
>
>I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
>and trying to "clean it up" will just result in less efficient or
>broken code.
>
>-- 
>RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>According to speedtest.net: 11.9Mbps down 500kbps up
Russell King - ARM Linux admin Jan. 29, 2020, 11:24 p.m. UTC | #13
On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
> >> 18.01.2020 02:22, Wei Yang пишет:
> >> > Use the general helper instead of do it by hand.
> >> > 
> >> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> > ---
> >> >  mm/mremap.c | 7 ++-----
> >> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >> > 
> >> > diff --git a/mm/mremap.c b/mm/mremap.c
> >> > index c2af8ba4ba43..a258914f3ee1 100644
> >> > --- a/mm/mremap.c
> >> > +++ b/mm/mremap.c
> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >  
> >> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >> >  		cond_resched();
> >> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> >> > -		/* even if next overflowed, extent below will be ok */
> >> > +		next = pmd_addr_end(old_addr, old_end);
> >> >  		extent = next - old_addr;
> >> > -		if (extent > old_end - old_addr)
> >> > -			extent = old_end - old_addr;
> >> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >> >  		if (!old_pmd)
> >> >  			continue;
> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >  
> >> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
> >> >  			break;
> >> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> >> > +		next = pmd_addr_end(new_addr, new_addr + len);
> >> >  		if (extent > next - new_addr)
> >> >  			extent = next - new_addr;
> >> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> >> > 
> >> 
> >> Hello Wei,
> >> 
> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> >> Tegra (ARM32):
> >> 
> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> >> 
> >> and eventually kernel hangs.
> >> 
> >> Git's bisection points to this patch and reverting it helps. Please fix,
> >> thanks in advance.
> >
> >The above is definitely wrong - pXX_addr_end() are designed to be used
> >with an address index within the pXX table table and the address index
> >of either the last entry in the same pXX table or the beginning of the
> >_next_ pXX table.  Arbitary end address indicies are not allowed.
> >
> 
> #define pmd_addr_end(addr, end)						\
> ({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
> 	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
> })
> 
> If my understanding is correct, the definition here align the addr to next PMD
> boundary or end.
> 
> I don't see the possibility to across another PMD. Do I miss something?

Look at the definition of p*_addr_end() that are used when page tables
are rolled up.

> >When page tables are "rolled up" when levels don't exist, it is common
> >practice for these macros to just return their end address index.
> >Hence, if they are used with arbitary end address indicies, then the
> >iteration will fail.
> >
> >The only way to do this is:
> >
> >	next = pmd_addr_end(old_addr,
> >			pud_addr_end(old_addr,
> >				p4d_addr_end(old_addr,
> >					pgd_addr_end(old_addr, old_end))));
> >
> >which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
> >the correct end argument.  However, that's a more complex and verbose,
> >and likely less efficient than the current code.
> >
> >I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
> >and trying to "clean it up" will just result in less efficient or
> >broken code.
> >
> >-- 
> >RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> >FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> >According to speedtest.net: 11.9Mbps down 500kbps up
> 
> -- 
> Wei Yang
> Help you, Help me
>
Andrew Morton Jan. 29, 2020, 11:59 p.m. UTC | #14
On Wed, 29 Jan 2020 20:18:56 +0300 Dmitry Osipenko <digetx@gmail.com> wrote:

> 27.01.2020 05:59, Andrew Morton пишет:
> > On Sun, 26 Jan 2020 17:47:57 +0300 Dmitry Osipenko <digetx@gmail.com> wrote:
> ...
> >> Hello Wei,
> >>
> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> >> Tegra (ARM32):
> >>
> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> >>
> >> and eventually kernel hangs.
> >>
> >> Git's bisection points to this patch and reverting it helps. Please fix,
> >> thanks in advance.
> > 
> > Thanks.  I had these tagged for 5.7-rc1 anyway, so I'll drop all five
> > patches.
> > 
> 
> Hello Andrew,
> 
> FYI, I'm still seeing the offending patches in the today's next-20200129.

hm, me too.  Stephen, it's unexpected that 9ff4452912d63f ("mm/mremap:
use pmd_addr_end to calculate next in move_page_tables()") is still in
the -next lineup?  It was dropped from -mm on Jan 26?
Stephen Rothwell Jan. 30, 2020, 12:28 a.m. UTC | #15
Hi Andrew,

On Wed, 29 Jan 2020 15:59:07 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> hm, me too.  Stephen, it's unexpected that 9ff4452912d63f ("mm/mremap:
> use pmd_addr_end to calculate next in move_page_tables()") is still in
> the -next lineup?  It was dropped from -mm on Jan 26?

The mmotm 2020-01-28-20-05 arrived just to late for yesterday's
linux-next (it will be in today's linux-next).  The mmotm before that
was 2020-01-23-21-12.  I attempt to fetch mmotm (along with all the
remaining unmerged trees) about every 30 minutes (sometimes more often)
during the day.
Wei Yang Jan. 30, 2020, 1:30 a.m. UTC | #16
On Wed, Jan 29, 2020 at 11:24:41PM +0000, Russell King - ARM Linux admin wrote:
>On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
>> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
>> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> >> 18.01.2020 02:22, Wei Yang пишет:
>> >> > Use the general helper instead of do it by hand.
>> >> > 
>> >> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> > ---
>> >> >  mm/mremap.c | 7 ++-----
>> >> >  1 file changed, 2 insertions(+), 5 deletions(-)
>> >> > 
>> >> > diff --git a/mm/mremap.c b/mm/mremap.c
>> >> > index c2af8ba4ba43..a258914f3ee1 100644
>> >> > --- a/mm/mremap.c
>> >> > +++ b/mm/mremap.c
>> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >  
>> >> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> >> >  		cond_resched();
>> >> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>> >> > -		/* even if next overflowed, extent below will be ok */
>> >> > +		next = pmd_addr_end(old_addr, old_end);
>> >> >  		extent = next - old_addr;
>> >> > -		if (extent > old_end - old_addr)
>> >> > -			extent = old_end - old_addr;
>> >> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> >> >  		if (!old_pmd)
>> >> >  			continue;
>> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >  
>> >> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>> >> >  			break;
>> >> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>> >> > +		next = pmd_addr_end(new_addr, new_addr + len);
>> >> >  		if (extent > next - new_addr)
>> >> >  			extent = next - new_addr;
>> >> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> >> > 
>> >> 
>> >> Hello Wei,
>> >> 
>> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> >> Tegra (ARM32):
>> >> 
>> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>> >> 
>> >> and eventually kernel hangs.
>> >> 
>> >> Git's bisection points to this patch and reverting it helps. Please fix,
>> >> thanks in advance.
>> >
>> >The above is definitely wrong - pXX_addr_end() are designed to be used
>> >with an address index within the pXX table table and the address index
>> >of either the last entry in the same pXX table or the beginning of the
>> >_next_ pXX table.  Arbitary end address indicies are not allowed.
>> >
>> 
>> #define pmd_addr_end(addr, end)						\
>> ({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
>> 	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
>> })
>> 
>> If my understanding is correct, the definition here align the addr to next PMD
>> boundary or end.
>> 
>> I don't see the possibility to across another PMD. Do I miss something?
>
>Look at the definition of p*_addr_end() that are used when page tables
>are rolled up.
>

Sorry, I don't get your point.

What's the meaning of "roll up" here?

Would you mind giving me an example? I see pmd_addr_end() is not used in many
places in core kernel. By glancing those usages, all the places use it like
pmd_addr_end(addr, end). Seems no specially handing on the end address.

Or you mean the case when pmd_addr_end() is defined to return "end" directly? 

>> >When page tables are "rolled up" when levels don't exist, it is common
>> >practice for these macros to just return their end address index.
>> >Hence, if they are used with arbitary end address indicies, then the
>> >iteration will fail.
>> >
>> >The only way to do this is:
>> >
>> >	next = pmd_addr_end(old_addr,
>> >			pud_addr_end(old_addr,
>> >				p4d_addr_end(old_addr,
>> >					pgd_addr_end(old_addr, old_end))));
>> >
>> >which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
>> >the correct end argument.  However, that's a more complex and verbose,
>> >and likely less efficient than the current code.
>> >
>> >I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
>> >and trying to "clean it up" will just result in less efficient or
>> >broken code.
>> >
>> >-- 
>> >RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>> >FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>> >According to speedtest.net: 11.9Mbps down 500kbps up
>> 
>> -- 
>> Wei Yang
>> Help you, Help me
>> 
>
>-- 
>RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>According to speedtest.net: 11.9Mbps down 500kbps up
Russell King - ARM Linux admin Jan. 30, 2020, 2:15 p.m. UTC | #17
On Thu, Jan 30, 2020 at 09:30:00AM +0800, Wei Yang wrote:
> On Wed, Jan 29, 2020 at 11:24:41PM +0000, Russell King - ARM Linux admin wrote:
> >On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
> >> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
> >> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
> >> >> 18.01.2020 02:22, Wei Yang пишет:
> >> >> > Use the general helper instead of do it by hand.
> >> >> > 
> >> >> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> >> > ---
> >> >> >  mm/mremap.c | 7 ++-----
> >> >> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >> >> > 
> >> >> > diff --git a/mm/mremap.c b/mm/mremap.c
> >> >> > index c2af8ba4ba43..a258914f3ee1 100644
> >> >> > --- a/mm/mremap.c
> >> >> > +++ b/mm/mremap.c
> >> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >> >  
> >> >> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >> >> >  		cond_resched();
> >> >> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> >> >> > -		/* even if next overflowed, extent below will be ok */
> >> >> > +		next = pmd_addr_end(old_addr, old_end);
> >> >> >  		extent = next - old_addr;
> >> >> > -		if (extent > old_end - old_addr)
> >> >> > -			extent = old_end - old_addr;
> >> >> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >> >> >  		if (!old_pmd)
> >> >> >  			continue;
> >> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >> >  
> >> >> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
> >> >> >  			break;
> >> >> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> >> >> > +		next = pmd_addr_end(new_addr, new_addr + len);
> >> >> >  		if (extent > next - new_addr)
> >> >> >  			extent = next - new_addr;
> >> >> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> >> >> > 
> >> >> 
> >> >> Hello Wei,
> >> >> 
> >> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> >> >> Tegra (ARM32):
> >> >> 
> >> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> >> >> 
> >> >> and eventually kernel hangs.
> >> >> 
> >> >> Git's bisection points to this patch and reverting it helps. Please fix,
> >> >> thanks in advance.
> >> >
> >> >The above is definitely wrong - pXX_addr_end() are designed to be used
> >> >with an address index within the pXX table table and the address index
> >> >of either the last entry in the same pXX table or the beginning of the
> >> >_next_ pXX table.  Arbitary end address indicies are not allowed.
> >> >
> >> 
> >> #define pmd_addr_end(addr, end)						\
> >> ({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
> >> 	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
> >> })
> >> 
> >> If my understanding is correct, the definition here align the addr to next PMD
> >> boundary or end.
> >> 
> >> I don't see the possibility to across another PMD. Do I miss something?
> >
> >Look at the definition of p*_addr_end() that are used when page tables
> >are rolled up.
> >
> 
> Sorry, I don't get your point.
> 
> What's the meaning of "roll up" here?
> 
> Would you mind giving me an example? I see pmd_addr_end() is not used in many
> places in core kernel. By glancing those usages, all the places use it like
> pmd_addr_end(addr, end). Seems no specially handing on the end address.
> 
> Or you mean the case when pmd_addr_end() is defined to return "end" directly? 

Not all hardware has five levels of page tables.  When hardware does not
have five levels, it is common to "roll up" some of the page tables into
others.

There are generic ways to implement this, which include using:

include/asm-generic/pgtable-nop4d.h
include/asm-generic/pgtable-nopud.h
include/asm-generic/pgtable-nopmd.h

and then there's architecture ways to implement this.  32-bit ARM takes
its implementation for PMD not from the generic version, which
post-dates 32-bit ARM, but from how page table roll-up was implemented
back at the time when the current ARM scheme was devised.  The generic
scheme is unsuitable for 32-bit ARM since we do more than just roll-up
page tables, but this is irrelevent for this discussion.

All three of the generic implementations, and 32-bit ARM, define the
pXX_addr_end() macros thusly:

include/asm-generic/pgtable-nop4d.h:#define p4d_addr_end(addr, end) (end)
include/asm-generic/pgtable-nopmd.h:#define pmd_addr_end(addr, end) (end)
include/asm-generic/pgtable-nopud.h:#define pud_addr_end(addr, end) (end)
arch/arm/include/asm/pgtable-2level.h:#define pmd_addr_end(addr,end) (end)

since, as I stated, pXX_addr_end() expects its "end" argument to be
the address index of the next entry in the immediately upper page
table level, or the address index of the last entry we wish to
process, which ever is smaller.

If it's larger than the address index of the next entry in the
immediately upper page table level, then the effect of all these
macros will be to walk off the end of the current level of page
table.

To see how they _should_ be used, see the loops in free_pgd_range()
and the free_pXX_range() functions called from there and below.

In all cases when the pXX_addr_end() macro was introduced, what I state
above holds true - and I believe still holds true today, until this
patch that has reportedly caused issues.
Wei Yang Jan. 30, 2020, 9:57 p.m. UTC | #18
On Thu, Jan 30, 2020 at 02:15:05PM +0000, Russell King - ARM Linux admin wrote:
>On Thu, Jan 30, 2020 at 09:30:00AM +0800, Wei Yang wrote:
>> On Wed, Jan 29, 2020 at 11:24:41PM +0000, Russell King - ARM Linux admin wrote:
>> >On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
>> >> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
>> >> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> >> >> 18.01.2020 02:22, Wei Yang пишет:
>> >> >> > Use the general helper instead of do it by hand.
>> >> >> > 
>> >> >> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> >> > ---
>> >> >> >  mm/mremap.c | 7 ++-----
>> >> >> >  1 file changed, 2 insertions(+), 5 deletions(-)
>> >> >> > 
>> >> >> > diff --git a/mm/mremap.c b/mm/mremap.c
>> >> >> > index c2af8ba4ba43..a258914f3ee1 100644
>> >> >> > --- a/mm/mremap.c
>> >> >> > +++ b/mm/mremap.c
>> >> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >> >  
>> >> >> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> >> >> >  		cond_resched();
>> >> >> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>> >> >> > -		/* even if next overflowed, extent below will be ok */
>> >> >> > +		next = pmd_addr_end(old_addr, old_end);
>> >> >> >  		extent = next - old_addr;
>> >> >> > -		if (extent > old_end - old_addr)
>> >> >> > -			extent = old_end - old_addr;
>> >> >> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> >> >> >  		if (!old_pmd)
>> >> >> >  			continue;
>> >> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >> >  
>> >> >> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>> >> >> >  			break;
>> >> >> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>> >> >> > +		next = pmd_addr_end(new_addr, new_addr + len);
>> >> >> >  		if (extent > next - new_addr)
>> >> >> >  			extent = next - new_addr;
>> >> >> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> >> >> > 
>> >> >> 
>> >> >> Hello Wei,
>> >> >> 
>> >> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> >> >> Tegra (ARM32):
>> >> >> 
>> >> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>> >> >> 
>> >> >> and eventually kernel hangs.
>> >> >> 
>> >> >> Git's bisection points to this patch and reverting it helps. Please fix,
>> >> >> thanks in advance.
>> >> >
>> >> >The above is definitely wrong - pXX_addr_end() are designed to be used
>> >> >with an address index within the pXX table table and the address index
>> >> >of either the last entry in the same pXX table or the beginning of the
>> >> >_next_ pXX table.  Arbitary end address indicies are not allowed.
>> >> >
>> >> 
>> >> #define pmd_addr_end(addr, end)						\
>> >> ({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
>> >> 	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
>> >> })
>> >> 
>> >> If my understanding is correct, the definition here align the addr to next PMD
>> >> boundary or end.
>> >> 
>> >> I don't see the possibility to across another PMD. Do I miss something?
>> >
>> >Look at the definition of p*_addr_end() that are used when page tables
>> >are rolled up.
>> >
>> 
>> Sorry, I don't get your point.
>> 
>> What's the meaning of "roll up" here?
>> 
>> Would you mind giving me an example? I see pmd_addr_end() is not used in many
>> places in core kernel. By glancing those usages, all the places use it like
>> pmd_addr_end(addr, end). Seems no specially handing on the end address.
>> 
>> Or you mean the case when pmd_addr_end() is defined to return "end" directly? 
>
>Not all hardware has five levels of page tables.  When hardware does not
>have five levels, it is common to "roll up" some of the page tables into
>others.
>
>There are generic ways to implement this, which include using:
>
>include/asm-generic/pgtable-nop4d.h
>include/asm-generic/pgtable-nopud.h
>include/asm-generic/pgtable-nopmd.h
>
>and then there's architecture ways to implement this.  32-bit ARM takes
>its implementation for PMD not from the generic version, which
>post-dates 32-bit ARM, but from how page table roll-up was implemented
>back at the time when the current ARM scheme was devised.  The generic
>scheme is unsuitable for 32-bit ARM since we do more than just roll-up
>page tables, but this is irrelevent for this discussion.
>
>All three of the generic implementations, and 32-bit ARM, define the
>pXX_addr_end() macros thusly:
>
>include/asm-generic/pgtable-nop4d.h:#define p4d_addr_end(addr, end) (end)
>include/asm-generic/pgtable-nopmd.h:#define pmd_addr_end(addr, end) (end)
>include/asm-generic/pgtable-nopud.h:#define pud_addr_end(addr, end) (end)
>arch/arm/include/asm/pgtable-2level.h:#define pmd_addr_end(addr,end) (end)
>
>since, as I stated, pXX_addr_end() expects its "end" argument to be
>the address index of the next entry in the immediately upper page
>table level, or the address index of the last entry we wish to
>process, which ever is smaller.
>
>If it's larger than the address index of the next entry in the
>immediately upper page table level, then the effect of all these
>macros will be to walk off the end of the current level of page
>table.
>
>To see how they _should_ be used, see the loops in free_pgd_range()
>and the free_pXX_range() functions called from there and below.
>
>In all cases when the pXX_addr_end() macro was introduced, what I state
>above holds true - and I believe still holds true today, until this
>patch that has reportedly caused issues.
>

Thanks for your patience in explaining this for me.

I got your point. This is my fault in understanding the code.

>-- 
>RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>According to speedtest.net: 11.9Mbps down 500kbps up

Patch
diff mbox series

diff --git a/mm/mremap.c b/mm/mremap.c
index c2af8ba4ba43..a258914f3ee1 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -253,11 +253,8 @@  unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
-		next = (old_addr + PMD_SIZE) & PMD_MASK;
-		/* even if next overflowed, extent below will be ok */
+		next = pmd_addr_end(old_addr, old_end);
 		extent = next - old_addr;
-		if (extent > old_end - old_addr)
-			extent = old_end - old_addr;
 		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
 		if (!old_pmd)
 			continue;
@@ -301,7 +298,7 @@  unsigned long move_page_tables(struct vm_area_struct *vma,
 
 		if (pte_alloc(new_vma->vm_mm, new_pmd))
 			break;
-		next = (new_addr + PMD_SIZE) & PMD_MASK;
+		next = pmd_addr_end(new_addr, new_addr + len);
 		if (extent > next - new_addr)
 			extent = next - new_addr;
 		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,