Message ID | 20190626061124.16013-2-alastair@au1.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Cleanup & allow modules to hotplug memory | expand |
On Wed 26-06-19 16:11:21, Alastair D'Silva wrote: > From: Alastair D'Silva <alastair@d-silva.org> > > If a memory section comes in where the physical address is greater than > that which is managed by the kernel, this function would not trigger the > bug and instead return a bogus section number. > > This patch tracks whether the section was actually found, and triggers the > bug if not. Why do we want/need that? In other words the changelog should contina WHY and WHAT. This one contains only the later one. > Signed-off-by: Alastair D'Silva <alastair@d-silva.org> > --- > drivers/base/memory.c | 18 +++++++++++++++--- > mm/sparse.c | 7 ++++++- > 2 files changed, 21 insertions(+), 4 deletions(-) > > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index f180427e48f4..9244c122abf1 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -585,13 +585,21 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn) > struct memory_block *find_memory_block_hinted(struct mem_section *section, > struct memory_block *hint) > { > - int block_id = base_memory_block_id(__section_nr(section)); > + int block_id, section_nr; > struct device *hintdev = hint ? &hint->dev : NULL; > struct device *dev; > > + section_nr = __section_nr(section); > + if (section_nr < 0) { > + if (hintdev) > + put_device(hintdev); > + return NULL; > + } > + > + block_id = base_memory_block_id(section_nr); > dev = subsys_find_device_by_id(&memory_subsys, block_id, hintdev); > - if (hint) > - put_device(&hint->dev); > + if (hintdev) > + put_device(hintdev); > if (!dev) > return NULL; > return to_memory_block(dev); > @@ -664,6 +672,10 @@ static int init_memory_block(struct memory_block **memory, > return -ENOMEM; > > scn_nr = __section_nr(section); > + > + if (scn_nr < 0) > + return scn_nr; > + > mem->start_section_nr = > base_memory_block_id(scn_nr) * sections_per_block; > mem->end_section_nr = mem->start_section_nr + sections_per_block - 1; > diff --git a/mm/sparse.c b/mm/sparse.c > index fd13166949b5..57a1a3d9c1cf 100644 > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -113,10 +113,15 @@ int __section_nr(struct mem_section* ms) > continue; > > if ((ms >= root) && (ms < (root + SECTIONS_PER_ROOT))) > - break; > + break; > } > > VM_BUG_ON(!root); > + if (root_nr == NR_SECTION_ROOTS) { > + VM_BUG_ON(true); > + > + return -EINVAL; > + } > > return (root_nr * SECTIONS_PER_ROOT) + (ms - root); > } > -- > 2.21.0
On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote: > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote: > > From: Alastair D'Silva <alastair@d-silva.org> > > > > If a memory section comes in where the physical address is greater > > than > > that which is managed by the kernel, this function would not > > trigger the > > bug and instead return a bogus section number. > > > > This patch tracks whether the section was actually found, and > > triggers the > > bug if not. > > Why do we want/need that? In other words the changelog should contina > WHY and WHAT. This one contains only the later one. > Thanks, I'll update the comment. During driver development, I tried adding peristent memory at a memory address that exceeded the maximum permissable address for the platform. This caused __section_nr to silently return bogus section numbers, rather than complaining.
On Wed 26-06-19 16:27:30, Alastair D'Silva wrote: > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote: > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote: > > > From: Alastair D'Silva <alastair@d-silva.org> > > > > > > If a memory section comes in where the physical address is greater > > > than > > > that which is managed by the kernel, this function would not > > > trigger the > > > bug and instead return a bogus section number. > > > > > > This patch tracks whether the section was actually found, and > > > triggers the > > > bug if not. > > > > Why do we want/need that? In other words the changelog should contina > > WHY and WHAT. This one contains only the later one. > > > > Thanks, I'll update the comment. > > During driver development, I tried adding peristent memory at a memory > address that exceeded the maximum permissable address for the platform. > > This caused __section_nr to silently return bogus section numbers, > rather than complaining. OK, I see, but is an additional code worth it for the non-development case? I mean why should we be testing for something that shouldn't happen normally? Is it too easy to get things wrong or what is the underlying reason to change it now?
On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote: > On Wed 26-06-19 16:27:30, Alastair D'Silva wrote: > > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote: > > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote: > > > > From: Alastair D'Silva <alastair@d-silva.org> > > > > > > > > If a memory section comes in where the physical address is > > > > greater > > > > than > > > > that which is managed by the kernel, this function would not > > > > trigger the > > > > bug and instead return a bogus section number. > > > > > > > > This patch tracks whether the section was actually found, and > > > > triggers the > > > > bug if not. > > > > > > Why do we want/need that? In other words the changelog should > > > contina > > > WHY and WHAT. This one contains only the later one. > > > > > > > Thanks, I'll update the comment. > > > > During driver development, I tried adding peristent memory at a > > memory > > address that exceeded the maximum permissable address for the > > platform. > > > > This caused __section_nr to silently return bogus section numbers, > > rather than complaining. > > OK, I see, but is an additional code worth it for the non-development > case? I mean why should we be testing for something that shouldn't > happen normally? Is it too easy to get things wrong or what is the > underlying reason to change it now? > It took me a while to identify what the problem was - having the BUG_ON would have saved me a few hours. I'm happy to just have the BUG_ON 'nd drop the new error return (I added that in response to Mike Rapoport's comment that the original patch would still return a bogus section number).
On Thu 27-06-19 10:50:57, Alastair D'Silva wrote: > On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote: > > On Wed 26-06-19 16:27:30, Alastair D'Silva wrote: > > > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote: > > > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote: > > > > > From: Alastair D'Silva <alastair@d-silva.org> > > > > > > > > > > If a memory section comes in where the physical address is > > > > > greater > > > > > than > > > > > that which is managed by the kernel, this function would not > > > > > trigger the > > > > > bug and instead return a bogus section number. > > > > > > > > > > This patch tracks whether the section was actually found, and > > > > > triggers the > > > > > bug if not. > > > > > > > > Why do we want/need that? In other words the changelog should > > > > contina > > > > WHY and WHAT. This one contains only the later one. > > > > > > > > > > Thanks, I'll update the comment. > > > > > > During driver development, I tried adding peristent memory at a > > > memory > > > address that exceeded the maximum permissable address for the > > > platform. > > > > > > This caused __section_nr to silently return bogus section numbers, > > > rather than complaining. > > > > OK, I see, but is an additional code worth it for the non-development > > case? I mean why should we be testing for something that shouldn't > > happen normally? Is it too easy to get things wrong or what is the > > underlying reason to change it now? > > > > It took me a while to identify what the problem was - having the BUG_ON > would have saved me a few hours. > > I'm happy to just have the BUG_ON 'nd drop the new error return (I > added that in response to Mike Rapoport's comment that the original > patch would still return a bogus section number). Well, BUG_ON is about the worst way to handle an incorrect input. You really do not want to put a production environment down just because there is a bug in a driver, right? There are still many {VM_}BUG_ONs in the tree and there is a general trend to get rid of many of them rather than adding new ones. Now back to your patch. You are adding an error handling where we simply do not expect invalid data. This is often the case for the core kernel functionality because we do expect consumers of the code to do the right thing. E.g. __section_nr already takes a pointer to struct section which assumes that this core data structure is already valid. Adding a check here adds an unnecessary overhead so this doesn't really sound like a good idea to me.
On Thu, 2019-06-27 at 10:10 +0200, Michal Hocko wrote: > On Thu 27-06-19 10:50:57, Alastair D'Silva wrote: > > On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote: > > > On Wed 26-06-19 16:27:30, Alastair D'Silva wrote: > > > > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote: > > > > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote: > > > > > > From: Alastair D'Silva <alastair@d-silva.org> > > > > > > > > > > > > If a memory section comes in where the physical address is > > > > > > greater > > > > > > than > > > > > > that which is managed by the kernel, this function would > > > > > > not > > > > > > trigger the > > > > > > bug and instead return a bogus section number. > > > > > > > > > > > > This patch tracks whether the section was actually found, > > > > > > and > > > > > > triggers the > > > > > > bug if not. > > > > > > > > > > Why do we want/need that? In other words the changelog should > > > > > contina > > > > > WHY and WHAT. This one contains only the later one. > > > > > > > > > > > > > Thanks, I'll update the comment. > > > > > > > > During driver development, I tried adding peristent memory at a > > > > memory > > > > address that exceeded the maximum permissable address for the > > > > platform. > > > > > > > > This caused __section_nr to silently return bogus section > > > > numbers, > > > > rather than complaining. > > > > > > OK, I see, but is an additional code worth it for the non- > > > development > > > case? I mean why should we be testing for something that > > > shouldn't > > > happen normally? Is it too easy to get things wrong or what is > > > the > > > underlying reason to change it now? > > > > > > > It took me a while to identify what the problem was - having the > > BUG_ON > > would have saved me a few hours. > > > > I'm happy to just have the BUG_ON 'nd drop the new error return (I > > added that in response to Mike Rapoport's comment that the original > > patch would still return a bogus section number). > > Well, BUG_ON is about the worst way to handle an incorrect input. You > really do not want to put a production environment down just because > there is a bug in a driver, right? There are still many {VM_}BUG_ONs > in the tree and there is a general trend to get rid of many of them > rather than adding new ones. > > Now back to your patch. You are adding an error handling where we > simply > do not expect invalid data. This is often the case for the core > kernel > functionality because we do expect consumers of the code to do the > right > thing. E.g. __section_nr already takes a pointer to struct section > which > assumes that this core data structure is already valid. Adding a > check > here adds an unnecessary overhead so this doesn't really sound like a > good idea to me. Thanks for providing a good explanation. In this situation, since we return a bogus section number, we get crashes elsewhere in the kernel if the code rumbles on. Given that there is already a VM_BUG_ON in the code, how do you feel about broadening the scope from 'VM_BUG_ON(!root)' to 'VM_BUG_ON(!root || (root_nr == NR_SECTION_ROOTS))'?
On 27.06.19 10:10, Michal Hocko wrote: > On Thu 27-06-19 10:50:57, Alastair D'Silva wrote: >> On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote: >>> On Wed 26-06-19 16:27:30, Alastair D'Silva wrote: >>>> On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote: >>>>> On Wed 26-06-19 16:11:21, Alastair D'Silva wrote: >>>>>> From: Alastair D'Silva <alastair@d-silva.org> >>>>>> >>>>>> If a memory section comes in where the physical address is >>>>>> greater >>>>>> than >>>>>> that which is managed by the kernel, this function would not >>>>>> trigger the >>>>>> bug and instead return a bogus section number. >>>>>> >>>>>> This patch tracks whether the section was actually found, and >>>>>> triggers the >>>>>> bug if not. >>>>> >>>>> Why do we want/need that? In other words the changelog should >>>>> contina >>>>> WHY and WHAT. This one contains only the later one. >>>>> >>>> >>>> Thanks, I'll update the comment. >>>> >>>> During driver development, I tried adding peristent memory at a >>>> memory >>>> address that exceeded the maximum permissable address for the >>>> platform. >>>> >>>> This caused __section_nr to silently return bogus section numbers, >>>> rather than complaining. >>> >>> OK, I see, but is an additional code worth it for the non-development >>> case? I mean why should we be testing for something that shouldn't >>> happen normally? Is it too easy to get things wrong or what is the >>> underlying reason to change it now? >>> >> >> It took me a while to identify what the problem was - having the BUG_ON >> would have saved me a few hours. >> >> I'm happy to just have the BUG_ON 'nd drop the new error return (I >> added that in response to Mike Rapoport's comment that the original >> patch would still return a bogus section number). > > Well, BUG_ON is about the worst way to handle an incorrect input. You > really do not want to put a production environment down just because > there is a bug in a driver, right? There are still many {VM_}BUG_ONs > in the tree and there is a general trend to get rid of many of them > rather than adding new ones. VM_BUG_ON is only really active with CONFIG_DEBUG_VM. On !CONFIG_DEBUG_VM it translated to BUILD_BUG_ON_INVALID(), which is a compile-time only check. Or am I missing something?
On Fri 28-06-19 13:37:07, David Hildenbrand wrote: > VM_BUG_ON is only really active with CONFIG_DEBUG_VM. On > !CONFIG_DEBUG_VM it translated to BUILD_BUG_ON_INVALID(), which is a > compile-time only check. > > Or am I missing something? You are not missing anything.
On Fri 28-06-19 10:46:28, Alastair D'Silva wrote: [...] > Given that there is already a VM_BUG_ON in the code, how do you feel > about broadening the scope from 'VM_BUG_ON(!root)' to 'VM_BUG_ON(!root > || (root_nr == NR_SECTION_ROOTS))'? As far as I understand the existing VM_BUG_ON will hit when the mem_section tree gets corrupted. This is a different situation to an incorrect section given so I wouldn't really mix those two. And I still do not see much point to protect from unexpected input parameter as this is internal function as already pointed out.
On Mon, 2019-07-01 at 12:46 +0200, Michal Hocko wrote: > On Fri 28-06-19 10:46:28, Alastair D'Silva wrote: > [...] > > Given that there is already a VM_BUG_ON in the code, how do you > > feel > > about broadening the scope from 'VM_BUG_ON(!root)' to > > 'VM_BUG_ON(!root > > > > (root_nr == NR_SECTION_ROOTS))'? > > As far as I understand the existing VM_BUG_ON will hit when the > mem_section tree gets corrupted. This is a different situation to an > incorrect section given so I wouldn't really mix those two. And I > still > do not see much point to protect from unexpected input parameter as > this > is internal function as already pointed out. > Hi Michael, I was able to hit this problem as the system firmware had assigned the prototype pmem device an address range above the 128TB limit that we originally supported. This has since been lifted to 2PB with patch 4ffe713b7587b14695c9bec26a000fc88ef54895. As it stands, we cannot move this range lower as the high bits are dictated by the location the card is connected. Since the physical address of the memory is not controlled by the kernel, I believe we should catch (or at least make it easy to debug) the sitution where external firmware allocates physical addresses beyond that which the kernel supports.
On Tue 02-07-19 14:13:25, Alastair D'Silva wrote: > On Mon, 2019-07-01 at 12:46 +0200, Michal Hocko wrote: > > On Fri 28-06-19 10:46:28, Alastair D'Silva wrote: > > [...] > > > Given that there is already a VM_BUG_ON in the code, how do you > > > feel > > > about broadening the scope from 'VM_BUG_ON(!root)' to > > > 'VM_BUG_ON(!root > > > > > (root_nr == NR_SECTION_ROOTS))'? > > > > As far as I understand the existing VM_BUG_ON will hit when the > > mem_section tree gets corrupted. This is a different situation to an > > incorrect section given so I wouldn't really mix those two. And I > > still > > do not see much point to protect from unexpected input parameter as > > this > > is internal function as already pointed out. > > > > Hi Michael, > > I was able to hit this problem as the system firmware had assigned the > prototype pmem device an address range above the 128TB limit that we > originally supported. This has since been lifted to 2PB with patch > 4ffe713b7587b14695c9bec26a000fc88ef54895. > > As it stands, we cannot move this range lower as the high bits are > dictated by the location the card is connected. > > Since the physical address of the memory is not controlled by the > kernel, I believe we should catch (or at least make it easy to debug) > the sitution where external firmware allocates physical addresses > beyond that which the kernel supports. Just make it clear, I am not against a sanitization. I am objecting to put it into __section_nr because this is way too late. As already explained, you already must have a bogus mem_section object in hand. Why cannot you add a sanity check right there when the memory is added? Either when the section is registered or even sooner in arch_add_memory.
On Tue, 2019-07-02 at 08:13 +0200, Michal Hocko wrote: > On Tue 02-07-19 14:13:25, Alastair D'Silva wrote: > > On Mon, 2019-07-01 at 12:46 +0200, Michal Hocko wrote: > > > On Fri 28-06-19 10:46:28, Alastair D'Silva wrote: > > > [...] > > > > Given that there is already a VM_BUG_ON in the code, how do you > > > > feel > > > > about broadening the scope from 'VM_BUG_ON(!root)' to > > > > 'VM_BUG_ON(!root > > > > > > (root_nr == NR_SECTION_ROOTS))'? > > > > > > As far as I understand the existing VM_BUG_ON will hit when the > > > mem_section tree gets corrupted. This is a different situation to > > > an > > > incorrect section given so I wouldn't really mix those two. And I > > > still > > > do not see much point to protect from unexpected input parameter > > > as > > > this > > > is internal function as already pointed out. > > > > > > > Hi Michael, > > > > I was able to hit this problem as the system firmware had assigned > > the > > prototype pmem device an address range above the 128TB limit that > > we > > originally supported. This has since been lifted to 2PB with patch > > 4ffe713b7587b14695c9bec26a000fc88ef54895. > > > > As it stands, we cannot move this range lower as the high bits are > > dictated by the location the card is connected. > > > > Since the physical address of the memory is not controlled by the > > kernel, I believe we should catch (or at least make it easy to > > debug) > > the sitution where external firmware allocates physical addresses > > beyond that which the kernel supports. > > Just make it clear, I am not against a sanitization. I am objecting > to > put it into __section_nr because this is way too late. As already > explained, you already must have a bogus mem_section object in hand. > Why cannot you add a sanity check right there when the memory is > added? > Either when the section is registered or even sooner in > arch_add_memory. > Good point, I was thinking of a libnvdimm enhancement to check that the end address is in range, but a more generic solution is better.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c index f180427e48f4..9244c122abf1 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -585,13 +585,21 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn) struct memory_block *find_memory_block_hinted(struct mem_section *section, struct memory_block *hint) { - int block_id = base_memory_block_id(__section_nr(section)); + int block_id, section_nr; struct device *hintdev = hint ? &hint->dev : NULL; struct device *dev; + section_nr = __section_nr(section); + if (section_nr < 0) { + if (hintdev) + put_device(hintdev); + return NULL; + } + + block_id = base_memory_block_id(section_nr); dev = subsys_find_device_by_id(&memory_subsys, block_id, hintdev); - if (hint) - put_device(&hint->dev); + if (hintdev) + put_device(hintdev); if (!dev) return NULL; return to_memory_block(dev); @@ -664,6 +672,10 @@ static int init_memory_block(struct memory_block **memory, return -ENOMEM; scn_nr = __section_nr(section); + + if (scn_nr < 0) + return scn_nr; + mem->start_section_nr = base_memory_block_id(scn_nr) * sections_per_block; mem->end_section_nr = mem->start_section_nr + sections_per_block - 1; diff --git a/mm/sparse.c b/mm/sparse.c index fd13166949b5..57a1a3d9c1cf 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -113,10 +113,15 @@ int __section_nr(struct mem_section* ms) continue; if ((ms >= root) && (ms < (root + SECTIONS_PER_ROOT))) - break; + break; } VM_BUG_ON(!root); + if (root_nr == NR_SECTION_ROOTS) { + VM_BUG_ON(true); + + return -EINVAL; + } return (root_nr * SECTIONS_PER_ROOT) + (ms - root); }