Message ID | 52A07BBE.7060507@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello, On Thu, Dec 05, 2013 at 03:12:30PM +0200, Grygorii Strashko wrote: > I'll try to provide more technical details here. > As Santosh mentioned in previous e-mails, it's not easy to simply > get rid of using MAX_NUMNODES: > 1) we introduce new interface memblock_allocX > 2) our interface uses memblock APIs __next_free_mem_range_rev() > and __next_free_mem_range() > 3) __next_free_mem_range_rev() and __next_free_mem_range() use MAX_NUMNODES > 4) _next_free_mem_range_rev() and __next_free_mem_range() are used standalone, > outside of our interface as part of *for_each_free_mem_range* or for_each_mem_pfn_range .. > > The point [4] leads to necessity to find and correct all places where memmblock APIs > are used and where it's expected to get MAX_NUMNODES as input parameter. > The major problem is that simple "grep" will not work, because memmblock APIs calls > are hidden inside other MM modules and it's not always clear > what will be passed as input parameters to APIs of these MM modules > (for example sparse_memory_present_with_active_regions() or sparse.c). Isn't that kinda trivial to work around? Make those functions accept both MAX_NUMNODES and NUMA_NO_NODE but emit warning on MAX_NUMNODES (preferably throttled reasonably). Given the history of API, we'd probably want to keep such warning for extended period of time but that's what we'd need to do no matter what. > As result, WIP patch, I did, and which was posted by Santosh illustrates > the probable size and complexity of the change. Again, I don't really mind the order things happen but I don't think it's a good idea to spread misusage with a new API. You gotta deal with it one way or the other. > Sorry, but question here is not "Do or not to do?", but rather 'how to do?", > taking into account complexity and state of the current MM code. > For example. would it be ok if I'll workaround the issue as in the attached patch? Well, it's more of when. It's not really a technically difficult task and all I'm saying is it better be sooner than later. Thanks.
On Thursday 05 December 2013 11:59 AM, Tejun Heo wrote: > Hello, > > On Thu, Dec 05, 2013 at 03:12:30PM +0200, Grygorii Strashko wrote: >> I'll try to provide more technical details here. >> As Santosh mentioned in previous e-mails, it's not easy to simply >> get rid of using MAX_NUMNODES: >> 1) we introduce new interface memblock_allocX >> 2) our interface uses memblock APIs __next_free_mem_range_rev() >> and __next_free_mem_range() >> 3) __next_free_mem_range_rev() and __next_free_mem_range() use MAX_NUMNODES >> 4) _next_free_mem_range_rev() and __next_free_mem_range() are used standalone, >> outside of our interface as part of *for_each_free_mem_range* or for_each_mem_pfn_range .. >> >> The point [4] leads to necessity to find and correct all places where memmblock APIs >> are used and where it's expected to get MAX_NUMNODES as input parameter. >> The major problem is that simple "grep" will not work, because memmblock APIs calls >> are hidden inside other MM modules and it's not always clear >> what will be passed as input parameters to APIs of these MM modules >> (for example sparse_memory_present_with_active_regions() or sparse.c). > > Isn't that kinda trivial to work around? Make those functions accept > both MAX_NUMNODES and NUMA_NO_NODE but emit warning on MAX_NUMNODES > (preferably throttled reasonably). Given the history of API, we'd > probably want to keep such warning for extended period of time but > that's what we'd need to do no matter what. > Looks a good idea. >> As result, WIP patch, I did, and which was posted by Santosh illustrates >> the probable size and complexity of the change. > > Again, I don't really mind the order things happen but I don't think > it's a good idea to spread misusage with a new API. You gotta deal > with it one way or the other. > >> Sorry, but question here is not "Do or not to do?", but rather 'how to do?", >> taking into account complexity and state of the current MM code. >> For example. would it be ok if I'll workaround the issue as in the attached patch? > > Well, it's more of when. It's not really a technically difficult > task and all I'm saying is it better be sooner than later. > Fair enough. Based on your suggestion, we will try to see if we can proceed with 4) accepting both MAX_NUMNODES and NUMA_NO_NODE. Thanks for the suggestion. regards, Santosh
On Thu, 5 Dec 2013 12:13:16 -0500 Santosh Shilimkar <santosh.shilimkar@ti.com> wrote: > On Thursday 05 December 2013 11:59 AM, Tejun Heo wrote: > > Hello, > > > > On Thu, Dec 05, 2013 at 03:12:30PM +0200, Grygorii Strashko wrote: > >> I'll try to provide more technical details here. > >> As Santosh mentioned in previous e-mails, it's not easy to simply > >> get rid of using MAX_NUMNODES: > >> 1) we introduce new interface memblock_allocX > >> 2) our interface uses memblock APIs __next_free_mem_range_rev() > >> and __next_free_mem_range() > >> 3) __next_free_mem_range_rev() and __next_free_mem_range() use MAX_NUMNODES > >> 4) _next_free_mem_range_rev() and __next_free_mem_range() are used standalone, > >> outside of our interface as part of *for_each_free_mem_range* or for_each_mem_pfn_range .. > >> > >> The point [4] leads to necessity to find and correct all places where memmblock APIs > >> are used and where it's expected to get MAX_NUMNODES as input parameter. > >> The major problem is that simple "grep" will not work, because memmblock APIs calls > >> are hidden inside other MM modules and it's not always clear > >> what will be passed as input parameters to APIs of these MM modules > >> (for example sparse_memory_present_with_active_regions() or sparse.c). > > > > Isn't that kinda trivial to work around? Make those functions accept > > both MAX_NUMNODES and NUMA_NO_NODE but emit warning on MAX_NUMNODES > > (preferably throttled reasonably). Given the history of API, we'd > > probably want to keep such warning for extended period of time but > > that's what we'd need to do no matter what. > > > Looks a good idea. > > >> As result, WIP patch, I did, and which was posted by Santosh illustrates > >> the probable size and complexity of the change. > > > > Again, I don't really mind the order things happen but I don't think > > it's a good idea to spread misusage with a new API. You gotta deal > > with it one way or the other. > > > >> Sorry, but question here is not "Do or not to do?", but rather 'how to do?", > >> taking into account complexity and state of the current MM code. > >> For example. would it be ok if I'll workaround the issue as in the attached patch? > > > > Well, it's more of when. It's not really a technically difficult > > task and all I'm saying is it better be sooner than later. > > > Fair enough. Based on your suggestion, we will try to see if > we can proceed with 4) accepting both MAX_NUMNODES and NUMA_NO_NODE. > > Thanks for the suggestion. So where do we now stand with this MAX_NUMNODES-vs-NUMA_NO_NODE mess? Is the conversion to NUMA_NO_NODE in current linux-next completed and nicely tested? Thanks.
On Friday 10 January 2014 07:53 PM, Andrew Morton wrote: > On Thu, 5 Dec 2013 12:13:16 -0500 Santosh Shilimkar <santosh.shilimkar@ti.com> wrote: > >> On Thursday 05 December 2013 11:59 AM, Tejun Heo wrote: >>> Hello, >>> >>> On Thu, Dec 05, 2013 at 03:12:30PM +0200, Grygorii Strashko wrote: >>>> I'll try to provide more technical details here. >>>> As Santosh mentioned in previous e-mails, it's not easy to simply >>>> get rid of using MAX_NUMNODES: >>>> 1) we introduce new interface memblock_allocX >>>> 2) our interface uses memblock APIs __next_free_mem_range_rev() >>>> and __next_free_mem_range() >>>> 3) __next_free_mem_range_rev() and __next_free_mem_range() use MAX_NUMNODES >>>> 4) _next_free_mem_range_rev() and __next_free_mem_range() are used standalone, >>>> outside of our interface as part of *for_each_free_mem_range* or for_each_mem_pfn_range .. >>>> >>>> The point [4] leads to necessity to find and correct all places where memmblock APIs >>>> are used and where it's expected to get MAX_NUMNODES as input parameter. >>>> The major problem is that simple "grep" will not work, because memmblock APIs calls >>>> are hidden inside other MM modules and it's not always clear >>>> what will be passed as input parameters to APIs of these MM modules >>>> (for example sparse_memory_present_with_active_regions() or sparse.c). >>> >>> Isn't that kinda trivial to work around? Make those functions accept >>> both MAX_NUMNODES and NUMA_NO_NODE but emit warning on MAX_NUMNODES >>> (preferably throttled reasonably). Given the history of API, we'd >>> probably want to keep such warning for extended period of time but >>> that's what we'd need to do no matter what. >>> >> Looks a good idea. >> >>>> As result, WIP patch, I did, and which was posted by Santosh illustrates >>>> the probable size and complexity of the change. >>> >>> Again, I don't really mind the order things happen but I don't think >>> it's a good idea to spread misusage with a new API. You gotta deal >>> with it one way or the other. >>> >>>> Sorry, but question here is not "Do or not to do?", but rather 'how to do?", >>>> taking into account complexity and state of the current MM code. >>>> For example. would it be ok if I'll workaround the issue as in the attached patch? >>> >>> Well, it's more of when. It's not really a technically difficult >>> task and all I'm saying is it better be sooner than later. >>> >> Fair enough. Based on your suggestion, we will try to see if >> we can proceed with 4) accepting both MAX_NUMNODES and NUMA_NO_NODE. >> >> Thanks for the suggestion. > > So where do we now stand with this MAX_NUMNODES-vs-NUMA_NO_NODE mess? > Is the conversion to NUMA_NO_NODE in current linux-next completed and > nicely tested? > From all the report so far, there were actually only 3 places in x86 code using MAX_NUMNODES and fix for that is already in your queue. So I guess we are good on that aspect now. Regards, Santosh
diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index 9e67fe4..84e778d 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -171,20 +171,20 @@ void __memblock_free_late(phys_addr_t base, phys_addr_t size); #define memblock_virt_alloc(x) \ memblock_virt_alloc_try_nid(x, 0, BOOTMEM_LOW_LIMIT, \ - BOOTMEM_ALLOC_ACCESSIBLE, MAX_NUMNODES) + BOOTMEM_ALLOC_ACCESSIBLE, NUMA_NO_NODE) #define memblock_virt_alloc_align(x, align) \ memblock_virt_alloc_try_nid(x, align, BOOTMEM_LOW_LIMIT, \ - BOOTMEM_ALLOC_ACCESSIBLE, MAX_NUMNODES) + BOOTMEM_ALLOC_ACCESSIBLE, NUMA_NO_NODE) #define memblock_virt_alloc_nopanic(x) \ memblock_virt_alloc_try_nid_nopanic(x, 0, \ BOOTMEM_LOW_LIMIT, \ BOOTMEM_ALLOC_ACCESSIBLE, \ - MAX_NUMNODES) + NUMA_NO_NODE) #define memblock_virt_alloc_align_nopanic(x, align) \ memblock_virt_alloc_try_nid_nopanic(x, align, \ BOOTMEM_LOW_LIMIT, \ BOOTMEM_ALLOC_ACCESSIBLE, \ - MAX_NUMNODES) + NUMA_NO_NODE) #define memblock_virt_alloc_node(x, nid) \ memblock_virt_alloc_try_nid(x, 0, BOOTMEM_LOW_LIMIT, \ BOOTMEM_ALLOC_ACCESSIBLE, nid) diff --git a/mm/memblock.c b/mm/memblock.c index 1503300..cae02a1 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -945,7 +945,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i * @max_addr: the upper bound of the memory region from where the allocation * is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to * allocate only from memory limited by memblock.current_limit value - * @nid: nid of the free area to find, %MAX_NUMNODES for any node + * @nid: nid of the free area to find, %NUMA_NO_NODE for any node * * The @min_addr limit is dropped if it can not be satisfied and the allocation * will fall back to memory below @min_addr. @@ -970,16 +970,27 @@ static void * __init _memblock_virt_alloc_try_nid_nopanic( phys_addr_t alloc; void *ptr; + /* + * TODO: this is WA as we should get NUMA_NO_NODE as input parameter + * to work with any node, but there are no guarantee that we always will + * Remove it once memblock core is converted to use NUMA_NO_NODE. + */ + nid = (nid == MAX_NUMNODES) ? NUMA_NO_NODE : nid; + if (WARN_ON_ONCE(slab_is_available())) { - if (nid == MAX_NUMNODES) - return kzalloc(size, GFP_NOWAIT); - else - return kzalloc_node(size, GFP_NOWAIT, nid); + return kzalloc_node(size, GFP_NOWAIT, nid); } if (!align) align = SMP_CACHE_BYTES; + /* + * TODO: this is WA as we get NUMA_NO_NODE as input parameter, but + * memblock core still uses MAX_NUMNODES. + * Remove it once memblock core is converted to use NUMA_NO_NODE. + */ + nid = (nid == NUMA_NO_NODE) ? MAX_NUMNODES : nid; + /* align @size to avoid excessive fragmentation on reserved array */ size = round_up(size, align); @@ -1028,7 +1039,7 @@ error: * @max_addr: the upper bound of the memory region from where the allocation * is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to * allocate only from memory limited by memblock.current_limit value - * @nid: nid of the free area to find, %MAX_NUMNODES for any node + * @nid: nid of the free area to find, %NUMA_NO_NODE for any node * * Public version of _memblock_virt_alloc_try_nid_nopanic() which provides * additional debug information (including caller info), if enabled. @@ -1056,7 +1067,7 @@ void * __init memblock_virt_alloc_try_nid_nopanic( * @max_addr: the upper bound of the memory region from where the allocation * is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to * allocate only from memory limited by memblock.current_limit value - * @nid: nid of the free area to find, %MAX_NUMNODES for any node + * @nid: nid of the free area to find, %NUMA_NO_NODE for any node * * Public panicking version of _memblock_virt_alloc_try_nid_nopanic() * which provides debug information (including caller info), if enabled, diff --git a/mm/percpu.c b/mm/percpu.c index f74902c..55a798e 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1853,7 +1853,7 @@ static void * __init pcpu_dfl_fc_alloc(unsigned int cpu, size_t size, return memblock_virt_alloc_try_nid_nopanic(size, align, __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, - MAX_NUMNODES); + NUMA_NO_NODE); } static void __init pcpu_dfl_fc_free(void *ptr, size_t size)