diff mbox series

[v2,1/3] mm: Trigger bug on if a section is not found in __section_nr

Message ID 20190626061124.16013-2-alastair@au1.ibm.com (mailing list archive)
State New, archived
Headers show
Series mm: Cleanup & allow modules to hotplug memory | expand

Commit Message

Alastair D'Silva June 26, 2019, 6:11 a.m. UTC
From: Alastair D'Silva <alastair@d-silva.org>

If a memory section comes in where the physical address is greater than
that which is managed by the kernel, this function would not trigger the
bug and instead return a bogus section number.

This patch tracks whether the section was actually found, and triggers the
bug if not.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
 drivers/base/memory.c | 18 +++++++++++++++---
 mm/sparse.c           |  7 ++++++-
 2 files changed, 21 insertions(+), 4 deletions(-)

Comments

Michal Hocko June 26, 2019, 6:21 a.m. UTC | #1
On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> If a memory section comes in where the physical address is greater than
> that which is managed by the kernel, this function would not trigger the
> bug and instead return a bogus section number.
> 
> This patch tracks whether the section was actually found, and triggers the
> bug if not.

Why do we want/need that? In other words the changelog should contina
WHY and WHAT. This one contains only the later one.
 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---
>  drivers/base/memory.c | 18 +++++++++++++++---
>  mm/sparse.c           |  7 ++++++-
>  2 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index f180427e48f4..9244c122abf1 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -585,13 +585,21 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
>  struct memory_block *find_memory_block_hinted(struct mem_section *section,
>  					      struct memory_block *hint)
>  {
> -	int block_id = base_memory_block_id(__section_nr(section));
> +	int block_id, section_nr;
>  	struct device *hintdev = hint ? &hint->dev : NULL;
>  	struct device *dev;
>  
> +	section_nr = __section_nr(section);
> +	if (section_nr < 0) {
> +		if (hintdev)
> +			put_device(hintdev);
> +		return NULL;
> +	}
> +
> +	block_id = base_memory_block_id(section_nr);
>  	dev = subsys_find_device_by_id(&memory_subsys, block_id, hintdev);
> -	if (hint)
> -		put_device(&hint->dev);
> +	if (hintdev)
> +		put_device(hintdev);
>  	if (!dev)
>  		return NULL;
>  	return to_memory_block(dev);
> @@ -664,6 +672,10 @@ static int init_memory_block(struct memory_block **memory,
>  		return -ENOMEM;
>  
>  	scn_nr = __section_nr(section);
> +
> +	if (scn_nr < 0)
> +		return scn_nr;
> +
>  	mem->start_section_nr =
>  			base_memory_block_id(scn_nr) * sections_per_block;
>  	mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fd13166949b5..57a1a3d9c1cf 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -113,10 +113,15 @@ int __section_nr(struct mem_section* ms)
>  			continue;
>  
>  		if ((ms >= root) && (ms < (root + SECTIONS_PER_ROOT)))
> -		     break;
> +			break;
>  	}
>  
>  	VM_BUG_ON(!root);
> +	if (root_nr == NR_SECTION_ROOTS) {
> +		VM_BUG_ON(true);
> +
> +		return -EINVAL;
> +	}
>  
>  	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
>  }
> -- 
> 2.21.0
Alastair D'Silva June 26, 2019, 6:27 a.m. UTC | #2
On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
> On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> > From: Alastair D'Silva <alastair@d-silva.org>
> > 
> > If a memory section comes in where the physical address is greater
> > than
> > that which is managed by the kernel, this function would not
> > trigger the
> > bug and instead return a bogus section number.
> > 
> > This patch tracks whether the section was actually found, and
> > triggers the
> > bug if not.
> 
> Why do we want/need that? In other words the changelog should contina
> WHY and WHAT. This one contains only the later one.
>  

Thanks, I'll update the comment.

During driver development, I tried adding peristent memory at a memory
address that exceeded the maximum permissable address for the platform.

This caused __section_nr to silently return bogus section numbers,
rather than complaining.
Michal Hocko June 26, 2019, 6:57 a.m. UTC | #3
On Wed 26-06-19 16:27:30, Alastair D'Silva wrote:
> On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
> > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> > > From: Alastair D'Silva <alastair@d-silva.org>
> > > 
> > > If a memory section comes in where the physical address is greater
> > > than
> > > that which is managed by the kernel, this function would not
> > > trigger the
> > > bug and instead return a bogus section number.
> > > 
> > > This patch tracks whether the section was actually found, and
> > > triggers the
> > > bug if not.
> > 
> > Why do we want/need that? In other words the changelog should contina
> > WHY and WHAT. This one contains only the later one.
> >  
> 
> Thanks, I'll update the comment.
> 
> During driver development, I tried adding peristent memory at a memory
> address that exceeded the maximum permissable address for the platform.
> 
> This caused __section_nr to silently return bogus section numbers,
> rather than complaining.

OK, I see, but is an additional code worth it for the non-development
case? I mean why should we be testing for something that shouldn't
happen normally? Is it too easy to get things wrong or what is the
underlying reason to change it now?
Alastair D'Silva June 27, 2019, 12:50 a.m. UTC | #4
On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote:
> On Wed 26-06-19 16:27:30, Alastair D'Silva wrote:
> > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
> > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> > > > From: Alastair D'Silva <alastair@d-silva.org>
> > > > 
> > > > If a memory section comes in where the physical address is
> > > > greater
> > > > than
> > > > that which is managed by the kernel, this function would not
> > > > trigger the
> > > > bug and instead return a bogus section number.
> > > > 
> > > > This patch tracks whether the section was actually found, and
> > > > triggers the
> > > > bug if not.
> > > 
> > > Why do we want/need that? In other words the changelog should
> > > contina
> > > WHY and WHAT. This one contains only the later one.
> > >  
> > 
> > Thanks, I'll update the comment.
> > 
> > During driver development, I tried adding peristent memory at a
> > memory
> > address that exceeded the maximum permissable address for the
> > platform.
> > 
> > This caused __section_nr to silently return bogus section numbers,
> > rather than complaining.
> 
> OK, I see, but is an additional code worth it for the non-development
> case? I mean why should we be testing for something that shouldn't
> happen normally? Is it too easy to get things wrong or what is the
> underlying reason to change it now?
> 

It took me a while to identify what the problem was - having the BUG_ON
would have saved me a few hours.

I'm happy to just have the BUG_ON 'nd drop the new error return (I
added that in response to Mike Rapoport's comment that the original
patch would still return a bogus section number).
Michal Hocko June 27, 2019, 8:10 a.m. UTC | #5
On Thu 27-06-19 10:50:57, Alastair D'Silva wrote:
> On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote:
> > On Wed 26-06-19 16:27:30, Alastair D'Silva wrote:
> > > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
> > > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> > > > > From: Alastair D'Silva <alastair@d-silva.org>
> > > > > 
> > > > > If a memory section comes in where the physical address is
> > > > > greater
> > > > > than
> > > > > that which is managed by the kernel, this function would not
> > > > > trigger the
> > > > > bug and instead return a bogus section number.
> > > > > 
> > > > > This patch tracks whether the section was actually found, and
> > > > > triggers the
> > > > > bug if not.
> > > > 
> > > > Why do we want/need that? In other words the changelog should
> > > > contina
> > > > WHY and WHAT. This one contains only the later one.
> > > >  
> > > 
> > > Thanks, I'll update the comment.
> > > 
> > > During driver development, I tried adding peristent memory at a
> > > memory
> > > address that exceeded the maximum permissable address for the
> > > platform.
> > > 
> > > This caused __section_nr to silently return bogus section numbers,
> > > rather than complaining.
> > 
> > OK, I see, but is an additional code worth it for the non-development
> > case? I mean why should we be testing for something that shouldn't
> > happen normally? Is it too easy to get things wrong or what is the
> > underlying reason to change it now?
> > 
> 
> It took me a while to identify what the problem was - having the BUG_ON
> would have saved me a few hours.
> 
> I'm happy to just have the BUG_ON 'nd drop the new error return (I
> added that in response to Mike Rapoport's comment that the original
> patch would still return a bogus section number).

Well, BUG_ON is about the worst way to handle an incorrect input. You
really do not want to put a production environment down just because
there is a bug in a driver, right? There are still many {VM_}BUG_ONs
in the tree and there is a general trend to get rid of many of them
rather than adding new ones.

Now back to your patch. You are adding an error handling where we simply
do not expect invalid data. This is often the case for the core kernel
functionality because we do expect consumers of the code to do the right
thing. E.g. __section_nr already takes a pointer to struct section which
assumes that this core data structure is already valid. Adding a check
here adds an unnecessary overhead so this doesn't really sound like a
good idea to me.
Alastair D'Silva June 28, 2019, 12:46 a.m. UTC | #6
On Thu, 2019-06-27 at 10:10 +0200, Michal Hocko wrote:
> On Thu 27-06-19 10:50:57, Alastair D'Silva wrote:
> > On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote:
> > > On Wed 26-06-19 16:27:30, Alastair D'Silva wrote:
> > > > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
> > > > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> > > > > > From: Alastair D'Silva <alastair@d-silva.org>
> > > > > > 
> > > > > > If a memory section comes in where the physical address is
> > > > > > greater
> > > > > > than
> > > > > > that which is managed by the kernel, this function would
> > > > > > not
> > > > > > trigger the
> > > > > > bug and instead return a bogus section number.
> > > > > > 
> > > > > > This patch tracks whether the section was actually found,
> > > > > > and
> > > > > > triggers the
> > > > > > bug if not.
> > > > > 
> > > > > Why do we want/need that? In other words the changelog should
> > > > > contina
> > > > > WHY and WHAT. This one contains only the later one.
> > > > >  
> > > > 
> > > > Thanks, I'll update the comment.
> > > > 
> > > > During driver development, I tried adding peristent memory at a
> > > > memory
> > > > address that exceeded the maximum permissable address for the
> > > > platform.
> > > > 
> > > > This caused __section_nr to silently return bogus section
> > > > numbers,
> > > > rather than complaining.
> > > 
> > > OK, I see, but is an additional code worth it for the non-
> > > development
> > > case? I mean why should we be testing for something that
> > > shouldn't
> > > happen normally? Is it too easy to get things wrong or what is
> > > the
> > > underlying reason to change it now?
> > > 
> > 
> > It took me a while to identify what the problem was - having the
> > BUG_ON
> > would have saved me a few hours.
> > 
> > I'm happy to just have the BUG_ON 'nd drop the new error return (I
> > added that in response to Mike Rapoport's comment that the original
> > patch would still return a bogus section number).
> 
> Well, BUG_ON is about the worst way to handle an incorrect input. You
> really do not want to put a production environment down just because
> there is a bug in a driver, right? There are still many {VM_}BUG_ONs
> in the tree and there is a general trend to get rid of many of them
> rather than adding new ones.
> 
> Now back to your patch. You are adding an error handling where we
> simply
> do not expect invalid data. This is often the case for the core
> kernel
> functionality because we do expect consumers of the code to do the
> right
> thing. E.g. __section_nr already takes a pointer to struct section
> which
> assumes that this core data structure is already valid. Adding a
> check
> here adds an unnecessary overhead so this doesn't really sound like a
> good idea to me.


Thanks for providing a good explanation.

In this situation, since we return a bogus section number, we get
crashes elsewhere in the kernel if the code rumbles on.

Given that there is already a VM_BUG_ON in the code, how do you feel
about broadening the scope from 'VM_BUG_ON(!root)' to 'VM_BUG_ON(!root
|| (root_nr == NR_SECTION_ROOTS))'?
David Hildenbrand June 28, 2019, 11:37 a.m. UTC | #7
On 27.06.19 10:10, Michal Hocko wrote:
> On Thu 27-06-19 10:50:57, Alastair D'Silva wrote:
>> On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote:
>>> On Wed 26-06-19 16:27:30, Alastair D'Silva wrote:
>>>> On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
>>>>> On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
>>>>>> From: Alastair D'Silva <alastair@d-silva.org>
>>>>>>
>>>>>> If a memory section comes in where the physical address is
>>>>>> greater
>>>>>> than
>>>>>> that which is managed by the kernel, this function would not
>>>>>> trigger the
>>>>>> bug and instead return a bogus section number.
>>>>>>
>>>>>> This patch tracks whether the section was actually found, and
>>>>>> triggers the
>>>>>> bug if not.
>>>>>
>>>>> Why do we want/need that? In other words the changelog should
>>>>> contina
>>>>> WHY and WHAT. This one contains only the later one.
>>>>>  
>>>>
>>>> Thanks, I'll update the comment.
>>>>
>>>> During driver development, I tried adding peristent memory at a
>>>> memory
>>>> address that exceeded the maximum permissable address for the
>>>> platform.
>>>>
>>>> This caused __section_nr to silently return bogus section numbers,
>>>> rather than complaining.
>>>
>>> OK, I see, but is an additional code worth it for the non-development
>>> case? I mean why should we be testing for something that shouldn't
>>> happen normally? Is it too easy to get things wrong or what is the
>>> underlying reason to change it now?
>>>
>>
>> It took me a while to identify what the problem was - having the BUG_ON
>> would have saved me a few hours.
>>
>> I'm happy to just have the BUG_ON 'nd drop the new error return (I
>> added that in response to Mike Rapoport's comment that the original
>> patch would still return a bogus section number).
> 
> Well, BUG_ON is about the worst way to handle an incorrect input. You
> really do not want to put a production environment down just because
> there is a bug in a driver, right? There are still many {VM_}BUG_ONs
> in the tree and there is a general trend to get rid of many of them
> rather than adding new ones.

VM_BUG_ON is only really active with CONFIG_DEBUG_VM. On
!CONFIG_DEBUG_VM it translated to BUILD_BUG_ON_INVALID(), which is a
compile-time only check.

Or am I missing something?
Michal Hocko June 28, 2019, 11:58 a.m. UTC | #8
On Fri 28-06-19 13:37:07, David Hildenbrand wrote:
> VM_BUG_ON is only really active with CONFIG_DEBUG_VM. On
> !CONFIG_DEBUG_VM it translated to BUILD_BUG_ON_INVALID(), which is a
> compile-time only check.
> 
> Or am I missing something?

You are not missing anything.
Michal Hocko July 1, 2019, 10:46 a.m. UTC | #9
On Fri 28-06-19 10:46:28, Alastair D'Silva wrote:
[...]
> Given that there is already a VM_BUG_ON in the code, how do you feel
> about broadening the scope from 'VM_BUG_ON(!root)' to 'VM_BUG_ON(!root
> || (root_nr == NR_SECTION_ROOTS))'?

As far as I understand the existing VM_BUG_ON will hit when the
mem_section tree gets corrupted. This is a different situation to an
incorrect section given so I wouldn't really mix those two. And I still
do not see much point to protect from unexpected input parameter as this
is internal function as already pointed out.
Alastair D'Silva July 2, 2019, 4:13 a.m. UTC | #10
On Mon, 2019-07-01 at 12:46 +0200, Michal Hocko wrote:
> On Fri 28-06-19 10:46:28, Alastair D'Silva wrote:
> [...]
> > Given that there is already a VM_BUG_ON in the code, how do you
> > feel
> > about broadening the scope from 'VM_BUG_ON(!root)' to
> > 'VM_BUG_ON(!root
> > > > (root_nr == NR_SECTION_ROOTS))'?
> 
> As far as I understand the existing VM_BUG_ON will hit when the
> mem_section tree gets corrupted. This is a different situation to an
> incorrect section given so I wouldn't really mix those two. And I
> still
> do not see much point to protect from unexpected input parameter as
> this
> is internal function as already pointed out.
> 

Hi Michael,

I was able to hit this problem as the system firmware had assigned the
prototype pmem device an address range above the 128TB limit that we
originally supported. This has since been lifted to 2PB with patch
4ffe713b7587b14695c9bec26a000fc88ef54895.

As it stands, we cannot move this range lower as the high bits are
dictated by the location the card is connected.

Since the physical address of the memory is not controlled by the
kernel, I believe we should catch (or at least make it easy to debug)
the sitution where external firmware allocates physical addresses
beyond that which the kernel supports.
Michal Hocko July 2, 2019, 6:13 a.m. UTC | #11
On Tue 02-07-19 14:13:25, Alastair D'Silva wrote:
> On Mon, 2019-07-01 at 12:46 +0200, Michal Hocko wrote:
> > On Fri 28-06-19 10:46:28, Alastair D'Silva wrote:
> > [...]
> > > Given that there is already a VM_BUG_ON in the code, how do you
> > > feel
> > > about broadening the scope from 'VM_BUG_ON(!root)' to
> > > 'VM_BUG_ON(!root
> > > > > (root_nr == NR_SECTION_ROOTS))'?
> > 
> > As far as I understand the existing VM_BUG_ON will hit when the
> > mem_section tree gets corrupted. This is a different situation to an
> > incorrect section given so I wouldn't really mix those two. And I
> > still
> > do not see much point to protect from unexpected input parameter as
> > this
> > is internal function as already pointed out.
> > 
> 
> Hi Michael,
> 
> I was able to hit this problem as the system firmware had assigned the
> prototype pmem device an address range above the 128TB limit that we
> originally supported. This has since been lifted to 2PB with patch
> 4ffe713b7587b14695c9bec26a000fc88ef54895.
> 
> As it stands, we cannot move this range lower as the high bits are
> dictated by the location the card is connected.
> 
> Since the physical address of the memory is not controlled by the
> kernel, I believe we should catch (or at least make it easy to debug)
> the sitution where external firmware allocates physical addresses
> beyond that which the kernel supports.

Just make it clear, I am not against a sanitization. I am objecting to
put it into __section_nr because this is way too late. As already
explained, you already must have a bogus mem_section object in hand.
Why cannot you add a sanity check right there when the memory is added?
Either when the section is registered or even sooner in arch_add_memory.
Alastair D'Silva July 2, 2019, 6:16 a.m. UTC | #12
On Tue, 2019-07-02 at 08:13 +0200, Michal Hocko wrote:
> On Tue 02-07-19 14:13:25, Alastair D'Silva wrote:
> > On Mon, 2019-07-01 at 12:46 +0200, Michal Hocko wrote:
> > > On Fri 28-06-19 10:46:28, Alastair D'Silva wrote:
> > > [...]
> > > > Given that there is already a VM_BUG_ON in the code, how do you
> > > > feel
> > > > about broadening the scope from 'VM_BUG_ON(!root)' to
> > > > 'VM_BUG_ON(!root
> > > > > > (root_nr == NR_SECTION_ROOTS))'?
> > > 
> > > As far as I understand the existing VM_BUG_ON will hit when the
> > > mem_section tree gets corrupted. This is a different situation to
> > > an
> > > incorrect section given so I wouldn't really mix those two. And I
> > > still
> > > do not see much point to protect from unexpected input parameter
> > > as
> > > this
> > > is internal function as already pointed out.
> > > 
> > 
> > Hi Michael,
> > 
> > I was able to hit this problem as the system firmware had assigned
> > the
> > prototype pmem device an address range above the 128TB limit that
> > we
> > originally supported. This has since been lifted to 2PB with patch
> > 4ffe713b7587b14695c9bec26a000fc88ef54895.
> > 
> > As it stands, we cannot move this range lower as the high bits are
> > dictated by the location the card is connected.
> > 
> > Since the physical address of the memory is not controlled by the
> > kernel, I believe we should catch (or at least make it easy to
> > debug)
> > the sitution where external firmware allocates physical addresses
> > beyond that which the kernel supports.
> 
> Just make it clear, I am not against a sanitization. I am objecting
> to
> put it into __section_nr because this is way too late. As already
> explained, you already must have a bogus mem_section object in hand.
> Why cannot you add a sanity check right there when the memory is
> added?
> Either when the section is registered or even sooner in
> arch_add_memory.
> 

Good point, I was thinking of a libnvdimm enhancement to check that the
end address is in range, but a more generic solution is better.
diff mbox series

Patch

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f180427e48f4..9244c122abf1 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -585,13 +585,21 @@  int __weak arch_get_memory_phys_device(unsigned long start_pfn)
 struct memory_block *find_memory_block_hinted(struct mem_section *section,
 					      struct memory_block *hint)
 {
-	int block_id = base_memory_block_id(__section_nr(section));
+	int block_id, section_nr;
 	struct device *hintdev = hint ? &hint->dev : NULL;
 	struct device *dev;
 
+	section_nr = __section_nr(section);
+	if (section_nr < 0) {
+		if (hintdev)
+			put_device(hintdev);
+		return NULL;
+	}
+
+	block_id = base_memory_block_id(section_nr);
 	dev = subsys_find_device_by_id(&memory_subsys, block_id, hintdev);
-	if (hint)
-		put_device(&hint->dev);
+	if (hintdev)
+		put_device(hintdev);
 	if (!dev)
 		return NULL;
 	return to_memory_block(dev);
@@ -664,6 +672,10 @@  static int init_memory_block(struct memory_block **memory,
 		return -ENOMEM;
 
 	scn_nr = __section_nr(section);
+
+	if (scn_nr < 0)
+		return scn_nr;
+
 	mem->start_section_nr =
 			base_memory_block_id(scn_nr) * sections_per_block;
 	mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
diff --git a/mm/sparse.c b/mm/sparse.c
index fd13166949b5..57a1a3d9c1cf 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -113,10 +113,15 @@  int __section_nr(struct mem_section* ms)
 			continue;
 
 		if ((ms >= root) && (ms < (root + SECTIONS_PER_ROOT)))
-		     break;
+			break;
 	}
 
 	VM_BUG_ON(!root);
+	if (root_nr == NR_SECTION_ROOTS) {
+		VM_BUG_ON(true);
+
+		return -EINVAL;
+	}
 
 	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
 }