Message ID | 1530867675-9018-7-git-send-email-hejianet@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 7/6/18 5:01 AM, Jia He wrote: > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns > where possible") optimized the loop in memmap_init_zone(). But there is > still some room for improvement. E.g. in early_pfn_valid(), if pfn and > pfn+1 are in the same memblock region, we can record the last returned > memblock region index and check whether pfn++ is still in the same > region. > > Currently it only improve the performance on arm/arm64 and will have no > impact on other arches. > > For the performance improvement, after this set, I can see the time > overhead of memmap_init() is reduced from 27956us to 13537us in my > armv8a server(QDF2400 with 96G memory, pagesize 64k). This series would be a lot simpler if patches 4, 5, and 6 were dropped. The extra complexity does not make sense to save 0.0001s/T during not. Patches 1-3, look OK, but without patches 4-5 __init_memblock should be made local static as I suggested earlier. So, I think Jia should re-spin this series with only 3 patches. Or, Andrew could remove the from linux-next before merge. Thank you, Pavel
On 8/16/18 9:35 PM, Pasha Tatashin wrote: > > > On 7/6/18 5:01 AM, Jia He wrote: >> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns >> where possible") optimized the loop in memmap_init_zone(). But there is >> still some room for improvement. E.g. in early_pfn_valid(), if pfn and >> pfn+1 are in the same memblock region, we can record the last returned >> memblock region index and check whether pfn++ is still in the same >> region. >> >> Currently it only improve the performance on arm/arm64 and will have no >> impact on other arches. >> >> For the performance improvement, after this set, I can see the time >> overhead of memmap_init() is reduced from 27956us to 13537us in my >> armv8a server(QDF2400 with 96G memory, pagesize 64k). > > This series would be a lot simpler if patches 4, 5, and 6 were dropped. > The extra complexity does not make sense to save 0.0001s/T during not. s/not/boot > > Patches 1-3, look OK, but without patches 4-5 __init_memblock should be > made local static as I suggested earlier. s/__init_memblock/early_region_idx
Hi Pasha Thanks for the comments On 8/17/2018 9:35 AM, Pasha Tatashin Wrote: > > > On 7/6/18 5:01 AM, Jia He wrote: >> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns >> where possible") optimized the loop in memmap_init_zone(). But there is >> still some room for improvement. E.g. in early_pfn_valid(), if pfn and >> pfn+1 are in the same memblock region, we can record the last returned >> memblock region index and check whether pfn++ is still in the same >> region. >> >> Currently it only improve the performance on arm/arm64 and will have no >> impact on other arches. >> >> For the performance improvement, after this set, I can see the time >> overhead of memmap_init() is reduced from 27956us to 13537us in my >> armv8a server(QDF2400 with 96G memory, pagesize 64k). > > This series would be a lot simpler if patches 4, 5, and 6 were dropped. > The extra complexity does not make sense to save 0.0001s/T during not. > > Patches 1-3, look OK, but without patches 4-5 __init_memblock should be > made local static as I suggested earlier. > > So, I think Jia should re-spin this series with only 3 patches. Or, > Andrew could remove the from linux-next before merge. > I will respin it with #1-#3 patch if no more comments Cheers, Jia > Thank you, > Pavel >
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 57cdc42..83b1d11 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1267,11 +1267,16 @@ static inline int pfn_present(unsigned long pfn) #define pfn_to_nid(pfn) (0) #endif -#define early_pfn_valid(pfn) pfn_valid(pfn) #ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID extern ulong memblock_next_valid_pfn(ulong pfn); #define next_valid_pfn(pfn) memblock_next_valid_pfn(pfn) -#endif + +extern int pfn_valid_region(ulong pfn); +#define early_pfn_valid(pfn) pfn_valid_region(pfn) +#else +#define early_pfn_valid(pfn) pfn_valid(pfn) +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ + void sparse_init(void); #else #define sparse_init() do {} while (0)
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") optimized the loop in memmap_init_zone(). But there is still some room for improvement. E.g. in early_pfn_valid(), if pfn and pfn+1 are in the same memblock region, we can record the last returned memblock region index and check whether pfn++ is still in the same region. Currently it only improve the performance on arm/arm64 and will have no impact on other arches. For the performance improvement, after this set, I can see the time overhead of memmap_init() is reduced from 27956us to 13537us in my armv8a server(QDF2400 with 96G memory, pagesize 64k). Signed-off-by: Jia He <jia.he@hxt-semitech.com> --- include/linux/mmzone.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)