diff mbox series

[5.15] mm: validate buddy page before using

Message ID 20220616161746.3565225-6-xianting.tian@linux.alibaba.com (mailing list archive)
State New
Headers show
Series [5.15] mm: validate buddy page before using | expand

Commit Message

Xianting Tian June 16, 2022, 4:17 p.m. UTC
Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
can be fixed in a similar way too.

In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
from mm/page_alloc.c to mm/internal.h

In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
so it would have pfn_base=512 and mem_map began with 512th PFN when
CONFIG_FLATMEM=y.
But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
less than the pfn_base value. We need page_is_buddy() to verify the buddy to
prevent accessing an invalid buddy.

Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
Cc: stable@vger.kernel.org
Reported-by: zjb194813@alibaba-inc.com
Reported-by: tianhu.hh@alibaba-inc.com
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
---
 mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
 mm/page_alloc.c     | 37 +++----------------------------------
 mm/page_isolation.c |  3 ++-
 3 files changed, 39 insertions(+), 35 deletions(-)

Comments

Greg Kroah-Hartman June 20, 2022, 10:17 a.m. UTC | #1
On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
> can be fixed in a similar way too.
> 
> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
> from mm/page_alloc.c to mm/internal.h
> 
> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
> so it would have pfn_base=512 and mem_map began with 512th PFN when
> CONFIG_FLATMEM=y.
> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
> prevent accessing an invalid buddy.
> 
> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> Cc: stable@vger.kernel.org
> Reported-by: zjb194813@alibaba-inc.com
> Reported-by: tianhu.hh@alibaba-inc.com
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> ---
>  mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>  mm/page_alloc.c     | 37 +++----------------------------------
>  mm/page_isolation.c |  3 ++-
>  3 files changed, 39 insertions(+), 35 deletions(-)

What is the commit id of this in Linus's tree?

thanks,

greg k-h
Xianting Tian June 20, 2022, 10:54 a.m. UTC | #2
在 2022/6/20 下午6:17, Greg KH 写道:
> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>> can be fixed in a similar way too.
>>
>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>> from mm/page_alloc.c to mm/internal.h
>>
>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>> CONFIG_FLATMEM=y.
>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>> prevent accessing an invalid buddy.
>>
>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>> Cc: stable@vger.kernel.org
>> Reported-by: zjb194813@alibaba-inc.com
>> Reported-by: tianhu.hh@alibaba-inc.com
>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>> ---
>>   mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>   mm/page_alloc.c     | 37 +++----------------------------------
>>   mm/page_isolation.c |  3 ++-
>>   3 files changed, 39 insertions(+), 35 deletions(-)
> What is the commit id of this in Linus's tree?

It is also this one,

commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
Author: Zi Yan <ziy@nvidia.com>
Date:   Wed Mar 30 15:45:43 2022 -0700

     mm: page_alloc: validate buddy before check its migratetype.

     Whenever a buddy page is found, page_is_buddy() should be called to
     check its validity.  Add the missing check during pageblock merge 
check.

     Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging 
non-fallbackable pageblocks with others")
     Link: 
https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
     Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
     Signed-off-by: Zi Yan <ziy@nvidia.com>
     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

>
> thanks,
>
> greg k-h
Greg Kroah-Hartman June 20, 2022, 11:42 a.m. UTC | #3
On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
> 
> 在 2022/6/20 下午6:17, Greg KH 写道:
> > On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
> > > Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> > > fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
> > > can be fixed in a similar way too.
> > > 
> > > In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
> > > from mm/page_alloc.c to mm/internal.h
> > > 
> > > In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
> > > so it would have pfn_base=512 and mem_map began with 512th PFN when
> > > CONFIG_FLATMEM=y.
> > > But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
> > > less than the pfn_base value. We need page_is_buddy() to verify the buddy to
> > > prevent accessing an invalid buddy.
> > > 
> > > Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> > > Cc: stable@vger.kernel.org
> > > Reported-by: zjb194813@alibaba-inc.com
> > > Reported-by: tianhu.hh@alibaba-inc.com
> > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> > > ---
> > >   mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
> > >   mm/page_alloc.c     | 37 +++----------------------------------
> > >   mm/page_isolation.c |  3 ++-
> > >   3 files changed, 39 insertions(+), 35 deletions(-)
> > What is the commit id of this in Linus's tree?
> 
> It is also this one,
> 
> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
> Author: Zi Yan <ziy@nvidia.com>
> Date:   Wed Mar 30 15:45:43 2022 -0700
> 
>     mm: page_alloc: validate buddy before check its migratetype.
> 
>     Whenever a buddy page is found, page_is_buddy() should be called to
>     check its validity.  Add the missing check during pageblock merge check.
> 
>     Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> pageblocks with others")
>     Link:
> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>     Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>     Signed-off-by: Zi Yan <ziy@nvidia.com>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

This commit looks nothing like what you posted here.

Why the vast difference with no explaination as to why these are so
different from the other backports you provided here?  Also why is the
subject lines changed?

Something went really wrong here, I'm going to drop all of these from
the stable queues and wait for a full series of all new backports, with
the correct upstream commit id added, and the original signed-off-by
lines preserved.

thanks,

greg k-h
Xianting Tian June 20, 2022, 11:57 a.m. UTC | #4
在 2022/6/20 下午7:42, Greg KH 写道:
> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
>> 在 2022/6/20 下午6:17, Greg KH 写道:
>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>>>> can be fixed in a similar way too.
>>>>
>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>>>> from mm/page_alloc.c to mm/internal.h
>>>>
>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>>>> CONFIG_FLATMEM=y.
>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>>>> prevent accessing an invalid buddy.
>>>>
>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>> Cc: stable@vger.kernel.org
>>>> Reported-by: zjb194813@alibaba-inc.com
>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>> ---
>>>>    mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>>>    mm/page_alloc.c     | 37 +++----------------------------------
>>>>    mm/page_isolation.c |  3 ++-
>>>>    3 files changed, 39 insertions(+), 35 deletions(-)
>>> What is the commit id of this in Linus's tree?
>> It is also this one,
>>
>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
>> Author: Zi Yan <ziy@nvidia.com>
>> Date:   Wed Mar 30 15:45:43 2022 -0700
>>
>>      mm: page_alloc: validate buddy before check its migratetype.
>>
>>      Whenever a buddy page is found, page_is_buddy() should be called to
>>      check its validity.  Add the missing check during pageblock merge check.
>>
>>      Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>> pageblocks with others")
>>      Link:
>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>>      Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>>      Signed-off-by: Zi Yan <ziy@nvidia.com>
>>      Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> This commit looks nothing like what you posted here.
>
> Why the vast difference with no explaination as to why these are so
> different from the other backports you provided here?  Also why is the
> subject lines changed?

Yes, the changes of 5.15 are not same with others branches, because we 
need additional fix for 5.15,

You can check it in the thread:

https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ 
<https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>

Right. But pfn_valid_within() was removed since 5.15. So your fix is
required for kernels between 5.15 and 5.17 (inclusive).

> Something went really wrong here, I'm going to drop all of these from
> the stable queues and wait for a full series of all new backports, with
> the correct upstream commit id added, and the original signed-off-by
> lines preserved.
>
> thanks,
>
> greg k-h
Greg Kroah-Hartman June 20, 2022, 12:06 p.m. UTC | #5
On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
> 
> 在 2022/6/20 下午7:42, Greg KH 写道:
> > On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
> > > 在 2022/6/20 下午6:17, Greg KH 写道:
> > > > On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
> > > > > Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> > > > > fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
> > > > > can be fixed in a similar way too.
> > > > > 
> > > > > In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
> > > > > from mm/page_alloc.c to mm/internal.h
> > > > > 
> > > > > In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
> > > > > so it would have pfn_base=512 and mem_map began with 512th PFN when
> > > > > CONFIG_FLATMEM=y.
> > > > > But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
> > > > > less than the pfn_base value. We need page_is_buddy() to verify the buddy to
> > > > > prevent accessing an invalid buddy.
> > > > > 
> > > > > Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> > > > > Cc: stable@vger.kernel.org
> > > > > Reported-by: zjb194813@alibaba-inc.com
> > > > > Reported-by: tianhu.hh@alibaba-inc.com
> > > > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> > > > > ---
> > > > >    mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
> > > > >    mm/page_alloc.c     | 37 +++----------------------------------
> > > > >    mm/page_isolation.c |  3 ++-
> > > > >    3 files changed, 39 insertions(+), 35 deletions(-)
> > > > What is the commit id of this in Linus's tree?
> > > It is also this one,
> > > 
> > > commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
> > > Author: Zi Yan <ziy@nvidia.com>
> > > Date:   Wed Mar 30 15:45:43 2022 -0700
> > > 
> > >      mm: page_alloc: validate buddy before check its migratetype.
> > > 
> > >      Whenever a buddy page is found, page_is_buddy() should be called to
> > >      check its validity.  Add the missing check during pageblock merge check.
> > > 
> > >      Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> > > pageblocks with others")
> > >      Link:
> > > https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
> > >      Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
> > >      Signed-off-by: Zi Yan <ziy@nvidia.com>
> > >      Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > This commit looks nothing like what you posted here.
> > 
> > Why the vast difference with no explaination as to why these are so
> > different from the other backports you provided here?  Also why is the
> > subject lines changed?
> 
> Yes, the changes of 5.15 are not same with others branches, because we need
> additional fix for 5.15,
> 
> You can check it in the thread:
> 
> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
> 
> Right. But pfn_valid_within() was removed since 5.15. So your fix is
> required for kernels between 5.15 and 5.17 (inclusive).

What is "your fix" here?

This change differs a lot from what is in Linus's tree now, so this all
needs to be resend and fixed up as I mention above if we are going to be
able to take this.  As-is, it's all not correct so are dropped.

thanks,

greg k-h
Xianting Tian June 20, 2022, 12:18 p.m. UTC | #6
在 2022/6/20 下午8:06, Greg KH 写道:
> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
>> 在 2022/6/20 下午7:42, Greg KH 写道:
>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>>>>>> can be fixed in a similar way too.
>>>>>>
>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>>>>>> from mm/page_alloc.c to mm/internal.h
>>>>>>
>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>>>>>> CONFIG_FLATMEM=y.
>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>>>>>> prevent accessing an invalid buddy.
>>>>>>
>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>> Cc: stable@vger.kernel.org
>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>> ---
>>>>>>     mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>>>>>     mm/page_alloc.c     | 37 +++----------------------------------
>>>>>>     mm/page_isolation.c |  3 ++-
>>>>>>     3 files changed, 39 insertions(+), 35 deletions(-)
>>>>> What is the commit id of this in Linus's tree?
>>>> It is also this one,
>>>>
>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
>>>> Author: Zi Yan <ziy@nvidia.com>
>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
>>>>
>>>>       mm: page_alloc: validate buddy before check its migratetype.
>>>>
>>>>       Whenever a buddy page is found, page_is_buddy() should be called to
>>>>       check its validity.  Add the missing check during pageblock merge check.
>>>>
>>>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>> pageblocks with others")
>>>>       Link:
>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>>>>       Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>>>>       Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>       Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>> This commit looks nothing like what you posted here.
>>>
>>> Why the vast difference with no explaination as to why these are so
>>> different from the other backports you provided here?  Also why is the
>>> subject lines changed?
>> Yes, the changes of 5.15 are not same with others branches, because we need
>> additional fix for 5.15,
>>
>> You can check it in the thread:
>>
>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
>>
>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>> required for kernels between 5.15 and 5.17 (inclusive).
> What is "your fix" here?
>
> This change differs a lot from what is in Linus's tree now, so this all
> needs to be resend and fixed up as I mention above if we are going to be
> able to take this.  As-is, it's all not correct so are dropped.

I think, for branches except 5.15,  you can just backport Zi Yan's 
commit 787af64d05cd in Linus tree. I won't send more patches further,

For 5.15, because it need additional fix except commit 787af64d05cd,  I 
will send a new patch as your comments.

Is it ok for you?

>
> thanks,
>
> greg k-h
Greg Kroah-Hartman June 20, 2022, 12:24 p.m. UTC | #7
On Mon, Jun 20, 2022 at 08:18:40PM +0800, Xianting Tian wrote:
> 
> 在 2022/6/20 下午8:06, Greg KH 写道:
> > On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
> > > 在 2022/6/20 下午7:42, Greg KH 写道:
> > > > On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
> > > > > 在 2022/6/20 下午6:17, Greg KH 写道:
> > > > > > On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
> > > > > > > Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> > > > > > > fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
> > > > > > > can be fixed in a similar way too.
> > > > > > > 
> > > > > > > In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
> > > > > > > from mm/page_alloc.c to mm/internal.h
> > > > > > > 
> > > > > > > In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
> > > > > > > so it would have pfn_base=512 and mem_map began with 512th PFN when
> > > > > > > CONFIG_FLATMEM=y.
> > > > > > > But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
> > > > > > > less than the pfn_base value. We need page_is_buddy() to verify the buddy to
> > > > > > > prevent accessing an invalid buddy.
> > > > > > > 
> > > > > > > Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > > Reported-by: zjb194813@alibaba-inc.com
> > > > > > > Reported-by: tianhu.hh@alibaba-inc.com
> > > > > > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> > > > > > > ---
> > > > > > >     mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
> > > > > > >     mm/page_alloc.c     | 37 +++----------------------------------
> > > > > > >     mm/page_isolation.c |  3 ++-
> > > > > > >     3 files changed, 39 insertions(+), 35 deletions(-)
> > > > > > What is the commit id of this in Linus's tree?
> > > > > It is also this one,
> > > > > 
> > > > > commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
> > > > > Author: Zi Yan <ziy@nvidia.com>
> > > > > Date:   Wed Mar 30 15:45:43 2022 -0700
> > > > > 
> > > > >       mm: page_alloc: validate buddy before check its migratetype.
> > > > > 
> > > > >       Whenever a buddy page is found, page_is_buddy() should be called to
> > > > >       check its validity.  Add the missing check during pageblock merge check.
> > > > > 
> > > > >       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> > > > > pageblocks with others")
> > > > >       Link:
> > > > > https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
> > > > >       Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
> > > > >       Signed-off-by: Zi Yan <ziy@nvidia.com>
> > > > >       Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > > This commit looks nothing like what you posted here.
> > > > 
> > > > Why the vast difference with no explaination as to why these are so
> > > > different from the other backports you provided here?  Also why is the
> > > > subject lines changed?
> > > Yes, the changes of 5.15 are not same with others branches, because we need
> > > additional fix for 5.15,
> > > 
> > > You can check it in the thread:
> > > 
> > > https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
> > > 
> > > Right. But pfn_valid_within() was removed since 5.15. So your fix is
> > > required for kernels between 5.15 and 5.17 (inclusive).
> > What is "your fix" here?
> > 
> > This change differs a lot from what is in Linus's tree now, so this all
> > needs to be resend and fixed up as I mention above if we are going to be
> > able to take this.  As-is, it's all not correct so are dropped.
> 
> I think, for branches except 5.15,  you can just backport Zi Yan's commit
> 787af64d05cd in Linus tree. I won't send more patches further,

So just for 5.18?  I am confused.

> For 5.15, because it need additional fix except commit 787af64d05cd,  I will
> send a new patch as your comments.
> 
> Is it ok for you?

No, please send fixed up patches for all branches you want them applied
to as I do not understand what to do here at all, sorry.

greg k-h
Zi Yan June 20, 2022, 12:25 p.m. UTC | #8
On 20 Jun 2022, at 8:18, Xianting Tian wrote:

> 在 2022/6/20 下午8:06, Greg KH 写道:
>> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
>>> 在 2022/6/20 下午7:42, Greg KH 写道:
>>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
>>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
>>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>>>>>>> can be fixed in a similar way too.
>>>>>>>
>>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>>>>>>> from mm/page_alloc.c to mm/internal.h
>>>>>>>
>>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>>>>>>> CONFIG_FLATMEM=y.
>>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>>>>>>> prevent accessing an invalid buddy.
>>>>>>>
>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>> Cc: stable@vger.kernel.org
>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>> ---
>>>>>>>     mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>>>>>>     mm/page_alloc.c     | 37 +++----------------------------------
>>>>>>>     mm/page_isolation.c |  3 ++-
>>>>>>>     3 files changed, 39 insertions(+), 35 deletions(-)
>>>>>> What is the commit id of this in Linus's tree?
>>>>> It is also this one,
>>>>>
>>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
>>>>> Author: Zi Yan <ziy@nvidia.com>
>>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
>>>>>
>>>>>       mm: page_alloc: validate buddy before check its migratetype.
>>>>>
>>>>>       Whenever a buddy page is found, page_is_buddy() should be called to
>>>>>       check its validity.  Add the missing check during pageblock merge check.
>>>>>
>>>>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>> pageblocks with others")
>>>>>       Link:
>>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>>>>>       Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>>>>>       Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>       Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>> This commit looks nothing like what you posted here.
>>>>
>>>> Why the vast difference with no explaination as to why these are so
>>>> different from the other backports you provided here?  Also why is the
>>>> subject lines changed?
>>> Yes, the changes of 5.15 are not same with others branches, because we need
>>> additional fix for 5.15,
>>>
>>> You can check it in the thread:
>>>
>>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
>>>
>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>> required for kernels between 5.15 and 5.17 (inclusive).
>> What is "your fix" here?
>>
>> This change differs a lot from what is in Linus's tree now, so this all
>> needs to be resend and fixed up as I mention above if we are going to be
>> able to take this.  As-is, it's all not correct so are dropped.
>
> I think, for branches except 5.15,  you can just backport Zi Yan's commit 787af64d05cd in Linus tree. I won't send more patches further,

Please do not back port my commit 787af64d05cd directly, because although
it fixes the issue, the code indentation is not right. At least, I tried
to cherry-pick the commit and failed. The commit just happens to fix
the issue in commit d9dddbf55667.

>
> For 5.15, because it need additional fix except commit 787af64d05cd,  I will send a new patch as your comments.
>
> Is it ok for you?
>
>>
>> thanks,
>>
>> greg k-h

--
Best Regards,
Yan, Zi
Xianting Tian June 20, 2022, 12:41 p.m. UTC | #9
在 2022/6/20 下午8:24, Greg KH 写道:
> On Mon, Jun 20, 2022 at 08:18:40PM +0800, Xianting Tian wrote:
>> 在 2022/6/20 下午8:06, Greg KH 写道:
>>> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
>>>> 在 2022/6/20 下午7:42, Greg KH 写道:
>>>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
>>>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
>>>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>>>>>>>> can be fixed in a similar way too.
>>>>>>>>
>>>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>>>>>>>> from mm/page_alloc.c to mm/internal.h
>>>>>>>>
>>>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>>>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>>>>>>>> CONFIG_FLATMEM=y.
>>>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>>>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>>>>>>>> prevent accessing an invalid buddy.
>>>>>>>>
>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>>      mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>>>>>>>      mm/page_alloc.c     | 37 +++----------------------------------
>>>>>>>>      mm/page_isolation.c |  3 ++-
>>>>>>>>      3 files changed, 39 insertions(+), 35 deletions(-)
>>>>>>> What is the commit id of this in Linus's tree?
>>>>>> It is also this one,
>>>>>>
>>>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
>>>>>> Author: Zi Yan <ziy@nvidia.com>
>>>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
>>>>>>
>>>>>>        mm: page_alloc: validate buddy before check its migratetype.
>>>>>>
>>>>>>        Whenever a buddy page is found, page_is_buddy() should be called to
>>>>>>        check its validity.  Add the missing check during pageblock merge check.
>>>>>>
>>>>>>        Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>> pageblocks with others")
>>>>>>        Link:
>>>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>>>>>>        Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>>>>>>        Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>        Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>> This commit looks nothing like what you posted here.
>>>>>
>>>>> Why the vast difference with no explaination as to why these are so
>>>>> different from the other backports you provided here?  Also why is the
>>>>> subject lines changed?
>>>> Yes, the changes of 5.15 are not same with others branches, because we need
>>>> additional fix for 5.15,
>>>>
>>>> You can check it in the thread:
>>>>
>>>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
>>>>
>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>> What is "your fix" here?
>>>
>>> This change differs a lot from what is in Linus's tree now, so this all
>>> needs to be resend and fixed up as I mention above if we are going to be
>>> able to take this.  As-is, it's all not correct so are dropped.
>> I think, for branches except 5.15,  you can just backport Zi Yan's commit
>> 787af64d05cd in Linus tree. I won't send more patches further,
> So just for 5.18?  I am confused.
Sorry, 5.18 needs the same fix with 5.15.  I will send the patch for it.
>
>> For 5.15, because it need additional fix except commit 787af64d05cd,  I will
>> send a new patch as your comments.
>>
>> Is it ok for you?
> No, please send fixed up patches for all branches you want them applied
> to as I do not understand what to do here at all, sorry.
Understood. I will send for all branches.
>
> greg k-h
Zi Yan June 20, 2022, 12:45 p.m. UTC | #10
On 20 Jun 2022, at 8:24, Greg KH wrote:

> On Mon, Jun 20, 2022 at 08:18:40PM +0800, Xianting Tian wrote:
>>
>> 在 2022/6/20 下午8:06, Greg KH 写道:
>>> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
>>>> 在 2022/6/20 下午7:42, Greg KH 写道:
>>>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
>>>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
>>>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>>>>>>>> can be fixed in a similar way too.
>>>>>>>>
>>>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>>>>>>>> from mm/page_alloc.c to mm/internal.h
>>>>>>>>
>>>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>>>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>>>>>>>> CONFIG_FLATMEM=y.
>>>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>>>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>>>>>>>> prevent accessing an invalid buddy.
>>>>>>>>
>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>>     mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>>>>>>>     mm/page_alloc.c     | 37 +++----------------------------------
>>>>>>>>     mm/page_isolation.c |  3 ++-
>>>>>>>>     3 files changed, 39 insertions(+), 35 deletions(-)
>>>>>>> What is the commit id of this in Linus's tree?
>>>>>> It is also this one,
>>>>>>
>>>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
>>>>>> Author: Zi Yan <ziy@nvidia.com>
>>>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
>>>>>>
>>>>>>       mm: page_alloc: validate buddy before check its migratetype.
>>>>>>
>>>>>>       Whenever a buddy page is found, page_is_buddy() should be called to
>>>>>>       check its validity.  Add the missing check during pageblock merge check.
>>>>>>
>>>>>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>> pageblocks with others")
>>>>>>       Link:
>>>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>>>>>>       Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>>>>>>       Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>       Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>> This commit looks nothing like what you posted here.
>>>>>
>>>>> Why the vast difference with no explaination as to why these are so
>>>>> different from the other backports you provided here?  Also why is the
>>>>> subject lines changed?
>>>> Yes, the changes of 5.15 are not same with others branches, because we need
>>>> additional fix for 5.15,
>>>>
>>>> You can check it in the thread:
>>>>
>>>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
>>>>
>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>> What is "your fix" here?
>>>
>>> This change differs a lot from what is in Linus's tree now, so this all
>>> needs to be resend and fixed up as I mention above if we are going to be
>>> able to take this.  As-is, it's all not correct so are dropped.
>>
>> I think, for branches except 5.15,  you can just backport Zi Yan's commit
>> 787af64d05cd in Linus tree. I won't send more patches further,
>
> So just for 5.18?  I am confused.
>
>> For 5.15, because it need additional fix except commit 787af64d05cd,  I will
>> send a new patch as your comments.
>>
>> Is it ok for you?
>
> No, please send fixed up patches for all branches you want them applied
> to as I do not understand what to do here at all, sorry.

Hi Greg,

The fixes sent by Xianting do not exist in Linus’s tree, since the bug is
fixed by another commit, which was not intended to fix the bug from the commit
d9dddbf55667. These fixes only target the stable branches.

--
Best Regards,
Yan, Zi
Greg Kroah-Hartman June 20, 2022, 12:54 p.m. UTC | #11
On Mon, Jun 20, 2022 at 08:45:13AM -0400, Zi Yan wrote:
> On 20 Jun 2022, at 8:24, Greg KH wrote:
> 
> > On Mon, Jun 20, 2022 at 08:18:40PM +0800, Xianting Tian wrote:
> >>
> >> 在 2022/6/20 下午8:06, Greg KH 写道:
> >>> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
> >>>> 在 2022/6/20 下午7:42, Greg KH 写道:
> >>>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
> >>>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
> >>>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
> >>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> >>>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
> >>>>>>>> can be fixed in a similar way too.
> >>>>>>>>
> >>>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
> >>>>>>>> from mm/page_alloc.c to mm/internal.h
> >>>>>>>>
> >>>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
> >>>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
> >>>>>>>> CONFIG_FLATMEM=y.
> >>>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
> >>>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
> >>>>>>>> prevent accessing an invalid buddy.
> >>>>>>>>
> >>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> >>>>>>>> Cc: stable@vger.kernel.org
> >>>>>>>> Reported-by: zjb194813@alibaba-inc.com
> >>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
> >>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> >>>>>>>> ---
> >>>>>>>>     mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
> >>>>>>>>     mm/page_alloc.c     | 37 +++----------------------------------
> >>>>>>>>     mm/page_isolation.c |  3 ++-
> >>>>>>>>     3 files changed, 39 insertions(+), 35 deletions(-)
> >>>>>>> What is the commit id of this in Linus's tree?
> >>>>>> It is also this one,
> >>>>>>
> >>>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
> >>>>>> Author: Zi Yan <ziy@nvidia.com>
> >>>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
> >>>>>>
> >>>>>>       mm: page_alloc: validate buddy before check its migratetype.
> >>>>>>
> >>>>>>       Whenever a buddy page is found, page_is_buddy() should be called to
> >>>>>>       check its validity.  Add the missing check during pageblock merge check.
> >>>>>>
> >>>>>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> >>>>>> pageblocks with others")
> >>>>>>       Link:
> >>>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
> >>>>>>       Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
> >>>>>>       Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>>>>>       Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> >>>>> This commit looks nothing like what you posted here.
> >>>>>
> >>>>> Why the vast difference with no explaination as to why these are so
> >>>>> different from the other backports you provided here?  Also why is the
> >>>>> subject lines changed?
> >>>> Yes, the changes of 5.15 are not same with others branches, because we need
> >>>> additional fix for 5.15,
> >>>>
> >>>> You can check it in the thread:
> >>>>
> >>>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
> >>>>
> >>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
> >>>> required for kernels between 5.15 and 5.17 (inclusive).
> >>> What is "your fix" here?
> >>>
> >>> This change differs a lot from what is in Linus's tree now, so this all
> >>> needs to be resend and fixed up as I mention above if we are going to be
> >>> able to take this.  As-is, it's all not correct so are dropped.
> >>
> >> I think, for branches except 5.15,  you can just backport Zi Yan's commit
> >> 787af64d05cd in Linus tree. I won't send more patches further,
> >
> > So just for 5.18?  I am confused.
> >
> >> For 5.15, because it need additional fix except commit 787af64d05cd,  I will
> >> send a new patch as your comments.
> >>
> >> Is it ok for you?
> >
> > No, please send fixed up patches for all branches you want them applied
> > to as I do not understand what to do here at all, sorry.
> 
> Hi Greg,
> 
> The fixes sent by Xianting do not exist in Linus’s tree, since the bug is
> fixed by another commit, which was not intended to fix the bug from the commit
> d9dddbf55667. These fixes only target the stable branches.

Then that all needs to be documented very very very well as to why we
can't just take the commit that is in Linus's tree.

Why can't we take that commit instead?

thanks,

greg k-h
Zi Yan June 20, 2022, 2:13 p.m. UTC | #12
On 20 Jun 2022, at 8:54, Greg KH wrote:

> On Mon, Jun 20, 2022 at 08:45:13AM -0400, Zi Yan wrote:
>> On 20 Jun 2022, at 8:24, Greg KH wrote:
>>
>>> On Mon, Jun 20, 2022 at 08:18:40PM +0800, Xianting Tian wrote:
>>>>
>>>> 在 2022/6/20 下午8:06, Greg KH 写道:
>>>>> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
>>>>>> 在 2022/6/20 下午7:42, Greg KH 写道:
>>>>>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
>>>>>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
>>>>>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>>>>>>>>>> can be fixed in a similar way too.
>>>>>>>>>>
>>>>>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>>>>>>>>>> from mm/page_alloc.c to mm/internal.h
>>>>>>>>>>
>>>>>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>>>>>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>>>>>>>>>> CONFIG_FLATMEM=y.
>>>>>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>>>>>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>>>>>>>>>> prevent accessing an invalid buddy.
>>>>>>>>>>
>>>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>>>>> ---
>>>>>>>>>>     mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>>>>>>>>>     mm/page_alloc.c     | 37 +++----------------------------------
>>>>>>>>>>     mm/page_isolation.c |  3 ++-
>>>>>>>>>>     3 files changed, 39 insertions(+), 35 deletions(-)
>>>>>>>>> What is the commit id of this in Linus's tree?
>>>>>>>> It is also this one,
>>>>>>>>
>>>>>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
>>>>>>>> Author: Zi Yan <ziy@nvidia.com>
>>>>>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
>>>>>>>>
>>>>>>>>       mm: page_alloc: validate buddy before check its migratetype.
>>>>>>>>
>>>>>>>>       Whenever a buddy page is found, page_is_buddy() should be called to
>>>>>>>>       check its validity.  Add the missing check during pageblock merge check.
>>>>>>>>
>>>>>>>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>> pageblocks with others")
>>>>>>>>       Link:
>>>>>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>>>>>>>>       Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>>>>>>>>       Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>>       Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>> This commit looks nothing like what you posted here.
>>>>>>>
>>>>>>> Why the vast difference with no explaination as to why these are so
>>>>>>> different from the other backports you provided here?  Also why is the
>>>>>>> subject lines changed?
>>>>>> Yes, the changes of 5.15 are not same with others branches, because we need
>>>>>> additional fix for 5.15,
>>>>>>
>>>>>> You can check it in the thread:
>>>>>>
>>>>>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
>>>>>>
>>>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>>>> What is "your fix" here?
>>>>>
>>>>> This change differs a lot from what is in Linus's tree now, so this all
>>>>> needs to be resend and fixed up as I mention above if we are going to be
>>>>> able to take this.  As-is, it's all not correct so are dropped.
>>>>
>>>> I think, for branches except 5.15,  you can just backport Zi Yan's commit
>>>> 787af64d05cd in Linus tree. I won't send more patches further,
>>>
>>> So just for 5.18?  I am confused.
>>>
>>>> For 5.15, because it need additional fix except commit 787af64d05cd,  I will
>>>> send a new patch as your comments.
>>>>
>>>> Is it ok for you?
>>>
>>> No, please send fixed up patches for all branches you want them applied
>>> to as I do not understand what to do here at all, sorry.
>>
>> Hi Greg,
>>
>> The fixes sent by Xianting do not exist in Linus’s tree, since the bug is
>> fixed by another commit, which was not intended to fix the bug from the commit
>> d9dddbf55667. These fixes only target the stable branches.
>
> Then that all needs to be documented very very very well as to why we
> can't just take the commit that is in Linus's tree.
>
> Why can't we take that commit instead?

The situation is a little complicated.

The bug from commit d9dddbf55667 was not discovered back then. The commit 1dd214b8f21c
was trying to get migratetype merging more rigid and made the bug easy to get
hit, but none of us were aware of that the bug also exists in commit d9dddbf55667.
Then the commit 787af64d05cd fixed the bug, but since the original code was
changed by commit 1dd214b8f21c, thus, it does not directly apply to
commit d9dddbf55667. So I do not think it makes sense to use the original commits
1dd214b8f21c and 787af64d05cd, since the former makes a non bug fixing change and
the latter fixes the bug revealed by the former.

As a result, Xianting's patches fix the bug directly, looking more reasonable to me.

--
Best Regards,
Yan, Zi
Greg Kroah-Hartman June 20, 2022, 8:31 p.m. UTC | #13
On Mon, Jun 20, 2022 at 10:13:59AM -0400, Zi Yan wrote:
> On 20 Jun 2022, at 8:54, Greg KH wrote:
> 
> > On Mon, Jun 20, 2022 at 08:45:13AM -0400, Zi Yan wrote:
> >> On 20 Jun 2022, at 8:24, Greg KH wrote:
> >>
> >>> On Mon, Jun 20, 2022 at 08:18:40PM +0800, Xianting Tian wrote:
> >>>>
> >>>> 在 2022/6/20 下午8:06, Greg KH 写道:
> >>>>> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
> >>>>>> 在 2022/6/20 下午7:42, Greg KH 写道:
> >>>>>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
> >>>>>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
> >>>>>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
> >>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> >>>>>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
> >>>>>>>>>> can be fixed in a similar way too.
> >>>>>>>>>>
> >>>>>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
> >>>>>>>>>> from mm/page_alloc.c to mm/internal.h
> >>>>>>>>>>
> >>>>>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
> >>>>>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
> >>>>>>>>>> CONFIG_FLATMEM=y.
> >>>>>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
> >>>>>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
> >>>>>>>>>> prevent accessing an invalid buddy.
> >>>>>>>>>>
> >>>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> >>>>>>>>>> Cc: stable@vger.kernel.org
> >>>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
> >>>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
> >>>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> >>>>>>>>>> ---
> >>>>>>>>>>     mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
> >>>>>>>>>>     mm/page_alloc.c     | 37 +++----------------------------------
> >>>>>>>>>>     mm/page_isolation.c |  3 ++-
> >>>>>>>>>>     3 files changed, 39 insertions(+), 35 deletions(-)
> >>>>>>>>> What is the commit id of this in Linus's tree?
> >>>>>>>> It is also this one,
> >>>>>>>>
> >>>>>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
> >>>>>>>> Author: Zi Yan <ziy@nvidia.com>
> >>>>>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
> >>>>>>>>
> >>>>>>>>       mm: page_alloc: validate buddy before check its migratetype.
> >>>>>>>>
> >>>>>>>>       Whenever a buddy page is found, page_is_buddy() should be called to
> >>>>>>>>       check its validity.  Add the missing check during pageblock merge check.
> >>>>>>>>
> >>>>>>>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> >>>>>>>> pageblocks with others")
> >>>>>>>>       Link:
> >>>>>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
> >>>>>>>>       Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
> >>>>>>>>       Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>>>>>>>       Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> >>>>>>> This commit looks nothing like what you posted here.
> >>>>>>>
> >>>>>>> Why the vast difference with no explaination as to why these are so
> >>>>>>> different from the other backports you provided here?  Also why is the
> >>>>>>> subject lines changed?
> >>>>>> Yes, the changes of 5.15 are not same with others branches, because we need
> >>>>>> additional fix for 5.15,
> >>>>>>
> >>>>>> You can check it in the thread:
> >>>>>>
> >>>>>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
> >>>>>>
> >>>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
> >>>>>> required for kernels between 5.15 and 5.17 (inclusive).
> >>>>> What is "your fix" here?
> >>>>>
> >>>>> This change differs a lot from what is in Linus's tree now, so this all
> >>>>> needs to be resend and fixed up as I mention above if we are going to be
> >>>>> able to take this.  As-is, it's all not correct so are dropped.
> >>>>
> >>>> I think, for branches except 5.15,  you can just backport Zi Yan's commit
> >>>> 787af64d05cd in Linus tree. I won't send more patches further,
> >>>
> >>> So just for 5.18?  I am confused.
> >>>
> >>>> For 5.15, because it need additional fix except commit 787af64d05cd,  I will
> >>>> send a new patch as your comments.
> >>>>
> >>>> Is it ok for you?
> >>>
> >>> No, please send fixed up patches for all branches you want them applied
> >>> to as I do not understand what to do here at all, sorry.
> >>
> >> Hi Greg,
> >>
> >> The fixes sent by Xianting do not exist in Linus’s tree, since the bug is
> >> fixed by another commit, which was not intended to fix the bug from the commit
> >> d9dddbf55667. These fixes only target the stable branches.
> >
> > Then that all needs to be documented very very very well as to why we
> > can't just take the commit that is in Linus's tree.
> >
> > Why can't we take that commit instead?
> 
> The situation is a little complicated.
> 
> The bug from commit d9dddbf55667 was not discovered back then. The commit 1dd214b8f21c
> was trying to get migratetype merging more rigid and made the bug easy to get
> hit, but none of us were aware of that the bug also exists in commit d9dddbf55667.
> Then the commit 787af64d05cd fixed the bug, but since the original code was
> changed by commit 1dd214b8f21c, thus, it does not directly apply to
> commit d9dddbf55667. So I do not think it makes sense to use the original commits
> 1dd214b8f21c and 787af64d05cd, since the former makes a non bug fixing change and
> the latter fixes the bug revealed by the former.

That is exactly what we want to apply, we almost never want to apply
stuff that is not upstream.  When we do apply "custom" patches, they are
almost always wrong.  We have a long history of this, please let's just
take the originals please.

> As a result, Xianting's patches fix the bug directly, looking more reasonable to me.

Again, please no, let's take the originals and keep in step with what is
in Linus's tree which makes maintance and tracking and everything so
much easier over time.

thanks,

greg k-h
Xianting Tian June 22, 2022, 1:37 a.m. UTC | #14
在 2022/6/21 上午4:31, Greg KH 写道:
> On Mon, Jun 20, 2022 at 10:13:59AM -0400, Zi Yan wrote:
>> On 20 Jun 2022, at 8:54, Greg KH wrote:
>>
>>> On Mon, Jun 20, 2022 at 08:45:13AM -0400, Zi Yan wrote:
>>>> On 20 Jun 2022, at 8:24, Greg KH wrote:
>>>>
>>>>> On Mon, Jun 20, 2022 at 08:18:40PM +0800, Xianting Tian wrote:
>>>>>> 在 2022/6/20 下午8:06, Greg KH 写道:
>>>>>>> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
>>>>>>>> 在 2022/6/20 下午7:42, Greg KH 写道:
>>>>>>>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
>>>>>>>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
>>>>>>>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>>>>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>>>>>>>>>>>> can be fixed in a similar way too.
>>>>>>>>>>>>
>>>>>>>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>>>>>>>>>>>> from mm/page_alloc.c to mm/internal.h
>>>>>>>>>>>>
>>>>>>>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>>>>>>>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>>>>>>>>>>>> CONFIG_FLATMEM=y.
>>>>>>>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>>>>>>>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>>>>>>>>>>>> prevent accessing an invalid buddy.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>      mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>      mm/page_alloc.c     | 37 +++----------------------------------
>>>>>>>>>>>>      mm/page_isolation.c |  3 ++-
>>>>>>>>>>>>      3 files changed, 39 insertions(+), 35 deletions(-)
>>>>>>>>>>> What is the commit id of this in Linus's tree?
>>>>>>>>>> It is also this one,
>>>>>>>>>>
>>>>>>>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
>>>>>>>>>> Author: Zi Yan <ziy@nvidia.com>
>>>>>>>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
>>>>>>>>>>
>>>>>>>>>>        mm: page_alloc: validate buddy before check its migratetype.
>>>>>>>>>>
>>>>>>>>>>        Whenever a buddy page is found, page_is_buddy() should be called to
>>>>>>>>>>        check its validity.  Add the missing check during pageblock merge check.
>>>>>>>>>>
>>>>>>>>>>        Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>>>> pageblocks with others")
>>>>>>>>>>        Link:
>>>>>>>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>>>>>>>>>>        Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>>>>>>>>>>        Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>>>>        Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>>>> This commit looks nothing like what you posted here.
>>>>>>>>>
>>>>>>>>> Why the vast difference with no explaination as to why these are so
>>>>>>>>> different from the other backports you provided here?  Also why is the
>>>>>>>>> subject lines changed?
>>>>>>>> Yes, the changes of 5.15 are not same with others branches, because we need
>>>>>>>> additional fix for 5.15,
>>>>>>>>
>>>>>>>> You can check it in the thread:
>>>>>>>>
>>>>>>>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
>>>>>>>>
>>>>>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>>>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>>>>>> What is "your fix" here?
>>>>>>>
>>>>>>> This change differs a lot from what is in Linus's tree now, so this all
>>>>>>> needs to be resend and fixed up as I mention above if we are going to be
>>>>>>> able to take this.  As-is, it's all not correct so are dropped.
>>>>>> I think, for branches except 5.15,  you can just backport Zi Yan's commit
>>>>>> 787af64d05cd in Linus tree. I won't send more patches further,
>>>>> So just for 5.18?  I am confused.
>>>>>
>>>>>> For 5.15, because it need additional fix except commit 787af64d05cd,  I will
>>>>>> send a new patch as your comments.
>>>>>>
>>>>>> Is it ok for you?
>>>>> No, please send fixed up patches for all branches you want them applied
>>>>> to as I do not understand what to do here at all, sorry.
>>>> Hi Greg,
>>>>
>>>> The fixes sent by Xianting do not exist in Linus’s tree, since the bug is
>>>> fixed by another commit, which was not intended to fix the bug from the commit
>>>> d9dddbf55667. These fixes only target the stable branches.
>>> Then that all needs to be documented very very very well as to why we
>>> can't just take the commit that is in Linus's tree.
>>>
>>> Why can't we take that commit instead?
>> The situation is a little complicated.
>>
>> The bug from commit d9dddbf55667 was not discovered back then. The commit 1dd214b8f21c
>> was trying to get migratetype merging more rigid and made the bug easy to get
>> hit, but none of us were aware of that the bug also exists in commit d9dddbf55667.
>> Then the commit 787af64d05cd fixed the bug, but since the original code was
>> changed by commit 1dd214b8f21c, thus, it does not directly apply to
>> commit d9dddbf55667. So I do not think it makes sense to use the original commits
>> 1dd214b8f21c and 787af64d05cd, since the former makes a non bug fixing change and
>> the latter fixes the bug revealed by the former.
> That is exactly what we want to apply, we almost never want to apply
> stuff that is not upstream.  When we do apply "custom" patches, they are
> almost always wrong.  We have a long history of this, please let's just
> take the originals please.
>
>> As a result, Xianting's patches fix the bug directly, looking more reasonable to me.
> Again, please no, let's take the originals and keep in step with what is
> in Linus's tree which makes maintance and tracking and everything so
> much easier over time.
If so, I think we only can backport 787af64d from Linus tree to all 
stable branches. Our ultimate purpose is to solve the problem, I think 
@Zi Yan will agree?
>
> thanks,
>
> greg k-h
Zi Yan June 22, 2022, 1:52 a.m. UTC | #15
On 21 Jun 2022, at 21:37, Xianting Tian wrote:

> 在 2022/6/21 上午4:31, Greg KH 写道:
>> On Mon, Jun 20, 2022 at 10:13:59AM -0400, Zi Yan wrote:
>>> On 20 Jun 2022, at 8:54, Greg KH wrote:
>>>
>>>> On Mon, Jun 20, 2022 at 08:45:13AM -0400, Zi Yan wrote:
>>>>> On 20 Jun 2022, at 8:24, Greg KH wrote:
>>>>>
>>>>>> On Mon, Jun 20, 2022 at 08:18:40PM +0800, Xianting Tian wrote:
>>>>>>> 在 2022/6/20 下午8:06, Greg KH 写道:
>>>>>>>> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
>>>>>>>>> 在 2022/6/20 下午7:42, Greg KH 写道:
>>>>>>>>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
>>>>>>>>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
>>>>>>>>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>>>>>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>>>>>>>>>>>>> can be fixed in a similar way too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>>>>>>>>>>>>> from mm/page_alloc.c to mm/internal.h
>>>>>>>>>>>>>
>>>>>>>>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>>>>>>>>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>>>>>>>>>>>>> CONFIG_FLATMEM=y.
>>>>>>>>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>>>>>>>>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>>>>>>>>>>>>> prevent accessing an invalid buddy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>      mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>      mm/page_alloc.c     | 37 +++----------------------------------
>>>>>>>>>>>>>      mm/page_isolation.c |  3 ++-
>>>>>>>>>>>>>      3 files changed, 39 insertions(+), 35 deletions(-)
>>>>>>>>>>>> What is the commit id of this in Linus's tree?
>>>>>>>>>>> It is also this one,
>>>>>>>>>>>
>>>>>>>>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
>>>>>>>>>>> Author: Zi Yan <ziy@nvidia.com>
>>>>>>>>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
>>>>>>>>>>>
>>>>>>>>>>>        mm: page_alloc: validate buddy before check its migratetype.
>>>>>>>>>>>
>>>>>>>>>>>        Whenever a buddy page is found, page_is_buddy() should be called to
>>>>>>>>>>>        check its validity.  Add the missing check during pageblock merge check.
>>>>>>>>>>>
>>>>>>>>>>>        Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>>>>> pageblocks with others")
>>>>>>>>>>>        Link:
>>>>>>>>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>>>>>>>>>>>        Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>>>>>>>>>>>        Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>>>>>        Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>>>>> This commit looks nothing like what you posted here.
>>>>>>>>>>
>>>>>>>>>> Why the vast difference with no explaination as to why these are so
>>>>>>>>>> different from the other backports you provided here?  Also why is the
>>>>>>>>>> subject lines changed?
>>>>>>>>> Yes, the changes of 5.15 are not same with others branches, because we need
>>>>>>>>> additional fix for 5.15,
>>>>>>>>>
>>>>>>>>> You can check it in the thread:
>>>>>>>>>
>>>>>>>>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
>>>>>>>>>
>>>>>>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>>>>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>>>>>>> What is "your fix" here?
>>>>>>>>
>>>>>>>> This change differs a lot from what is in Linus's tree now, so this all
>>>>>>>> needs to be resend and fixed up as I mention above if we are going to be
>>>>>>>> able to take this.  As-is, it's all not correct so are dropped.
>>>>>>> I think, for branches except 5.15,  you can just backport Zi Yan's commit
>>>>>>> 787af64d05cd in Linus tree. I won't send more patches further,
>>>>>> So just for 5.18?  I am confused.
>>>>>>
>>>>>>> For 5.15, because it need additional fix except commit 787af64d05cd,  I will
>>>>>>> send a new patch as your comments.
>>>>>>>
>>>>>>> Is it ok for you?
>>>>>> No, please send fixed up patches for all branches you want them applied
>>>>>> to as I do not understand what to do here at all, sorry.
>>>>> Hi Greg,
>>>>>
>>>>> The fixes sent by Xianting do not exist in Linus’s tree, since the bug is
>>>>> fixed by another commit, which was not intended to fix the bug from the commit
>>>>> d9dddbf55667. These fixes only target the stable branches.
>>>> Then that all needs to be documented very very very well as to why we
>>>> can't just take the commit that is in Linus's tree.
>>>>
>>>> Why can't we take that commit instead?
>>> The situation is a little complicated.
>>>
>>> The bug from commit d9dddbf55667 was not discovered back then. The commit 1dd214b8f21c
>>> was trying to get migratetype merging more rigid and made the bug easy to get
>>> hit, but none of us were aware of that the bug also exists in commit d9dddbf55667.
>>> Then the commit 787af64d05cd fixed the bug, but since the original code was
>>> changed by commit 1dd214b8f21c, thus, it does not directly apply to
>>> commit d9dddbf55667. So I do not think it makes sense to use the original commits
>>> 1dd214b8f21c and 787af64d05cd, since the former makes a non bug fixing change and
>>> the latter fixes the bug revealed by the former.
>> That is exactly what we want to apply, we almost never want to apply
>> stuff that is not upstream.  When we do apply "custom" patches, they are
>> almost always wrong.  We have a long history of this, please let's just
>> take the originals please.
>>
>>> As a result, Xianting's patches fix the bug directly, looking more reasonable to me.
>> Again, please no, let's take the originals and keep in step with what is
>> in Linus's tree which makes maintance and tracking and everything so
>> much easier over time.
> If so, I think we only can backport 787af64d from Linus tree to all stable branches. Our ultimate purpose is to solve the problem, I think @Zi Yan will agree?

From my understanding, Greg wants us to backport commit 1dd214b8f21c and
787af64d05cd. Since you cannot take the original 787af64d05cd, which applies
to different code than the old kernel code.

In addition, to fix the issue in mm/page_isolation.c, I think
commit bb0e28eb5bc2 and 8170ac4700d2 are also needed to be backported.

--
Best Regards,
Yan, Zi
Xianting Tian July 6, 2022, 2:51 a.m. UTC | #16
在 2022/6/21 上午4:31, Greg KH 写道:
> On Mon, Jun 20, 2022 at 10:13:59AM -0400, Zi Yan wrote:
>> On 20 Jun 2022, at 8:54, Greg KH wrote:
>>
>>> On Mon, Jun 20, 2022 at 08:45:13AM -0400, Zi Yan wrote:
>>>> On 20 Jun 2022, at 8:24, Greg KH wrote:
>>>>
>>>>> On Mon, Jun 20, 2022 at 08:18:40PM +0800, Xianting Tian wrote:
>>>>>> 在 2022/6/20 下午8:06, Greg KH 写道:
>>>>>>> On Mon, Jun 20, 2022 at 07:57:05PM +0800, Xianting Tian wrote:
>>>>>>>> 在 2022/6/20 下午7:42, Greg KH 写道:
>>>>>>>>> On Mon, Jun 20, 2022 at 06:54:44PM +0800, Xianting Tian wrote:
>>>>>>>>>> 在 2022/6/20 下午6:17, Greg KH 写道:
>>>>>>>>>>> On Fri, Jun 17, 2022 at 12:17:45AM +0800, Xianting Tian wrote:
>>>>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>>>>>> fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that
>>>>>>>>>>>> can be fixed in a similar way too.
>>>>>>>>>>>>
>>>>>>>>>>>> In unset_migratetype_isolate(), we also need the fix, so move page_is_buddy()
>>>>>>>>>>>> from mm/page_alloc.c to mm/internal.h
>>>>>>>>>>>>
>>>>>>>>>>>> In addition, for RISC-V arch the first 2MB RAM could be reserved for opensbi,
>>>>>>>>>>>> so it would have pfn_base=512 and mem_map began with 512th PFN when
>>>>>>>>>>>> CONFIG_FLATMEM=y.
>>>>>>>>>>>> But __find_buddy_pfn algorithm thinks the start pfn 0, it could get 0 pfn or
>>>>>>>>>>>> less than the pfn_base value. We need page_is_buddy() to verify the buddy to
>>>>>>>>>>>> prevent accessing an invalid buddy.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>      mm/internal.h       | 34 ++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>      mm/page_alloc.c     | 37 +++----------------------------------
>>>>>>>>>>>>      mm/page_isolation.c |  3 ++-
>>>>>>>>>>>>      3 files changed, 39 insertions(+), 35 deletions(-)
>>>>>>>>>>> What is the commit id of this in Linus's tree?
>>>>>>>>>> It is also this one,
>>>>>>>>>>
>>>>>>>>>> commit 787af64d05cd528aac9ad16752d11bb1c6061bb9
>>>>>>>>>> Author: Zi Yan <ziy@nvidia.com>
>>>>>>>>>> Date:   Wed Mar 30 15:45:43 2022 -0700
>>>>>>>>>>
>>>>>>>>>>        mm: page_alloc: validate buddy before check its migratetype.
>>>>>>>>>>
>>>>>>>>>>        Whenever a buddy page is found, page_is_buddy() should be called to
>>>>>>>>>>        check its validity.  Add the missing check during pageblock merge check.
>>>>>>>>>>
>>>>>>>>>>        Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>>>> pageblocks with others")
>>>>>>>>>>        Link:
>>>>>>>>>> https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/
>>>>>>>>>>        Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
>>>>>>>>>>        Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>>>>        Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>>>> This commit looks nothing like what you posted here.
>>>>>>>>>
>>>>>>>>> Why the vast difference with no explaination as to why these are so
>>>>>>>>> different from the other backports you provided here?  Also why is the
>>>>>>>>> subject lines changed?
>>>>>>>> Yes, the changes of 5.15 are not same with others branches, because we need
>>>>>>>> additional fix for 5.15,
>>>>>>>>
>>>>>>>> You can check it in the thread:
>>>>>>>>
>>>>>>>> https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/ <https://lore.kernel.org/linux-mm/435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com/>
>>>>>>>>
>>>>>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>>>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>>>>>> What is "your fix" here?
>>>>>>>
>>>>>>> This change differs a lot from what is in Linus's tree now, so this all
>>>>>>> needs to be resend and fixed up as I mention above if we are going to be
>>>>>>> able to take this.  As-is, it's all not correct so are dropped.
>>>>>> I think, for branches except 5.15,  you can just backport Zi Yan's commit
>>>>>> 787af64d05cd in Linus tree. I won't send more patches further,
>>>>> So just for 5.18?  I am confused.
>>>>>
>>>>>> For 5.15, because it need additional fix except commit 787af64d05cd,  I will
>>>>>> send a new patch as your comments.
>>>>>>
>>>>>> Is it ok for you?
>>>>> No, please send fixed up patches for all branches you want them applied
>>>>> to as I do not understand what to do here at all, sorry.
>>>> Hi Greg,
>>>>
>>>> The fixes sent by Xianting do not exist in Linus’s tree, since the bug is
>>>> fixed by another commit, which was not intended to fix the bug from the commit
>>>> d9dddbf55667. These fixes only target the stable branches.
>>> Then that all needs to be documented very very very well as to why we
>>> can't just take the commit that is in Linus's tree.
>>>
>>> Why can't we take that commit instead?
>> The situation is a little complicated.
>>
>> The bug from commit d9dddbf55667 was not discovered back then. The commit 1dd214b8f21c
>> was trying to get migratetype merging more rigid and made the bug easy to get
>> hit, but none of us were aware of that the bug also exists in commit d9dddbf55667.
>> Then the commit 787af64d05cd fixed the bug, but since the original code was
>> changed by commit 1dd214b8f21c, thus, it does not directly apply to
>> commit d9dddbf55667. So I do not think it makes sense to use the original commits
>> 1dd214b8f21c and 787af64d05cd, since the former makes a non bug fixing change and
>> the latter fixes the bug revealed by the former.
> That is exactly what we want to apply, we almost never want to apply
> stuff that is not upstream.  When we do apply "custom" patches, they are
> almost always wrong.  We have a long history of this, please let's just
> take the originals please.
>
>> As a result, Xianting's patches fix the bug directly, looking more reasonable to me.
> Again, please no, let's take the originals and keep in step with what is
> in Linus's tree which makes maintance and tracking and everything so
> much easier over time.

Sorry for the dealy as some urgent issues need to handle.

I think I will send two series of patches, one for mm/page_alloc.c, the 
other for mm/page_isolation.c,

Based on Zi Yan's reply below last time, thanks

 From my understanding, Greg wants us to backport commit 1dd214b8f21c and
787af64d05cd. Since you cannot take the original 787af64d05cd, which applies
to different code than the old kernel code.

In addition, to fix the issue in mm/page_isolation.c, I think
commit bb0e28eb5bc2 and 8170ac4700d2 are also needed to be backported.

>
> thanks,
>
> greg k-h
diff mbox series

Patch

diff --git a/mm/internal.h b/mm/internal.h
index cf3cb933eba3..e838d825cfaa 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -340,6 +340,40 @@  static inline bool is_data_mapping(vm_flags_t flags)
 	return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE;
 }
 
+/*
+ * This function checks whether a page is free && is the buddy
+ * we can coalesce a page and its buddy if
+ * (a) the buddy is not in a hole (check before calling!) &&
+ * (b) the buddy is in the buddy system &&
+ * (c) a page and its buddy have the same order &&
+ * (d) a page and its buddy are in the same zone.
+ *
+ * For recording whether a page is in the buddy system, we set PageBuddy.
+ * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
+ *
+ * For recording page's order, we use page_private(page).
+ */
+static inline bool page_is_buddy(struct page *page, struct page *buddy,
+							unsigned int order)
+{
+	if (!page_is_guard(buddy) && !PageBuddy(buddy))
+		return false;
+
+	if (buddy_order(buddy) != order)
+		return false;
+
+	/*
+	 * zone check is done late to avoid uselessly calculating
+	 * zone/node ids for pages that could never merge.
+	 */
+	if (page_zone_id(page) != page_zone_id(buddy))
+		return false;
+
+	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
+
+	return true;
+}
+
 /* mm/util.c */
 void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma,
 		struct vm_area_struct *prev);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a0b7afae59e9..8a29c0ff1c7b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -875,40 +875,6 @@  static inline void set_buddy_order(struct page *page, unsigned int order)
 	__SetPageBuddy(page);
 }
 
-/*
- * This function checks whether a page is free && is the buddy
- * we can coalesce a page and its buddy if
- * (a) the buddy is not in a hole (check before calling!) &&
- * (b) the buddy is in the buddy system &&
- * (c) a page and its buddy have the same order &&
- * (d) a page and its buddy are in the same zone.
- *
- * For recording whether a page is in the buddy system, we set PageBuddy.
- * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
- *
- * For recording page's order, we use page_private(page).
- */
-static inline bool page_is_buddy(struct page *page, struct page *buddy,
-							unsigned int order)
-{
-	if (!page_is_guard(buddy) && !PageBuddy(buddy))
-		return false;
-
-	if (buddy_order(buddy) != order)
-		return false;
-
-	/*
-	 * zone check is done late to avoid uselessly calculating
-	 * zone/node ids for pages that could never merge.
-	 */
-	if (page_zone_id(page) != page_zone_id(buddy))
-		return false;
-
-	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
-
-	return true;
-}
-
 #ifdef CONFIG_COMPACTION
 static inline struct capture_control *task_capc(struct zone *zone)
 {
@@ -1118,6 +1084,9 @@  static inline void __free_one_page(struct page *page,
 
 			buddy_pfn = __find_buddy_pfn(pfn, order);
 			buddy = page + (buddy_pfn - pfn);
+
+			if (!page_is_buddy(page, buddy, order))
+				goto done_merging;
 			buddy_mt = get_pageblock_migratetype(buddy);
 
 			if (migratetype != buddy_mt
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index a95c2c6562d0..70c1870e786b 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -93,7 +93,8 @@  static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 			buddy_pfn = __find_buddy_pfn(pfn, order);
 			buddy = page + (buddy_pfn - pfn);
 
-			if (!is_migrate_isolate_page(buddy)) {
+			if (page_is_buddy(page, buddy, order) &&
+			    !is_migrate_isolate_page(buddy)) {
 				__isolate_free_page(page, order);
 				isolated_page = true;
 			}