Message ID | 20230307144621.10748-7-thomas.hellstrom@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/ttm: Small fixes / cleanups in prep for shrinking | expand |
Hi Thomas, Thank you for the patch! Yet something to improve: [auto build test ERROR on drm-misc/drm-misc-next] [also build test ERROR on drm-intel/for-linux-next] [cannot apply to drm-tip/drm-tip] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-ttm-Fix-a-NULL-pointer-dereference/20230307-224931 base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next patch link: https://lore.kernel.org/r/20230307144621.10748-7-thomas.hellstrom%40linux.intel.com patch subject: [Intel-gfx] [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages config: powerpc-randconfig-r006-20230306 (https://download.01.org/0day-ci/archive/20230308/202303080352.azyeWwwt-lkp@intel.com/config) compiler: powerpc-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/0eee47dba298051fc49965d56cb17dd113ff0236 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Thomas-Hellstr-m/drm-ttm-Fix-a-NULL-pointer-dereference/20230307-224931 git checkout 0eee47dba298051fc49965d56cb17dd113ff0236 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash drivers/gpu/drm/ttm/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303080352.azyeWwwt-lkp@intel.com/ All error/warnings (new ones prefixed by >>): In function 'ttm_pool_type_init', inlined from 'ttm_pool_init' at drivers/gpu/drm/ttm/ttm_pool.c:557:5: >> drivers/gpu/drm/ttm/ttm_pool.c:264:18: warning: iteration 9 invokes undefined behavior [-Waggressive-loop-optimizations] 264 | pt->pool = pool; | ~~~~~~~~~^~~~~~ drivers/gpu/drm/ttm/ttm_pool.c: In function 'ttm_pool_init': drivers/gpu/drm/ttm/ttm_pool.c:556:39: note: within this loop 556 | for (j = 0; j < TTM_DIM_ORDER; ++j) | ^ In file included from <command-line>: drivers/gpu/drm/ttm/ttm_pool.c: In function 'ttm_pool_mgr_init': >> include/linux/compiler_types.h:358:45: error: call to '__compiletime_assert_283' declared with attribute error: BUILD_BUG_ON failed: TTM_DIM_ORDER > MAX_ORDER 358 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ include/linux/compiler_types.h:339:25: note: in definition of macro '__compiletime_assert' 339 | prefix ## suffix(); \ | ^~~~~~ include/linux/compiler_types.h:358:9: note: in expansion of macro '_compiletime_assert' 358 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) | ^~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG' 50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) | ^~~~~~~~~~~~~~~~ drivers/gpu/drm/ttm/ttm_pool.c:744:9: note: in expansion of macro 'BUILD_BUG_ON' 744 | BUILD_BUG_ON(TTM_DIM_ORDER > MAX_ORDER); | ^~~~~~~~~~~~ vim +/__compiletime_assert_283 +358 include/linux/compiler_types.h eb5c2d4b45e3d2 Will Deacon 2020-07-21 344 eb5c2d4b45e3d2 Will Deacon 2020-07-21 345 #define _compiletime_assert(condition, msg, prefix, suffix) \ eb5c2d4b45e3d2 Will Deacon 2020-07-21 346 __compiletime_assert(condition, msg, prefix, suffix) eb5c2d4b45e3d2 Will Deacon 2020-07-21 347 eb5c2d4b45e3d2 Will Deacon 2020-07-21 348 /** eb5c2d4b45e3d2 Will Deacon 2020-07-21 349 * compiletime_assert - break build and emit msg if condition is false eb5c2d4b45e3d2 Will Deacon 2020-07-21 350 * @condition: a compile-time constant condition to check eb5c2d4b45e3d2 Will Deacon 2020-07-21 351 * @msg: a message to emit if condition is false eb5c2d4b45e3d2 Will Deacon 2020-07-21 352 * eb5c2d4b45e3d2 Will Deacon 2020-07-21 353 * In tradition of POSIX assert, this macro will break the build if the eb5c2d4b45e3d2 Will Deacon 2020-07-21 354 * supplied condition is *false*, emitting the supplied error message if the eb5c2d4b45e3d2 Will Deacon 2020-07-21 355 * compiler has support to do so. eb5c2d4b45e3d2 Will Deacon 2020-07-21 356 */ eb5c2d4b45e3d2 Will Deacon 2020-07-21 357 #define compiletime_assert(condition, msg) \ eb5c2d4b45e3d2 Will Deacon 2020-07-21 @358 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) eb5c2d4b45e3d2 Will Deacon 2020-07-21 359
Am 07.03.23 um 15:46 schrieb Thomas Hellström: > When swapping out, we will split multi-order pages both in order to > move them to the swap-cache and to be able to return memory to the > swap cache as soon as possible on a page-by-page basis. > Reduce the page max order to the system PMD size, as we can then be nicer > to the system and avoid splitting gigantic pages. Mhm, we actually have a todo to start supporting giant pages at some time. Using the folio directly just saves tons of overhead when you don't need to allocate 2MiG page array any more for each 1GiB you allocate. But that probably needs tons of work anyway, so feel free to add my rb for now. Regards, Christian. > > Looking forward to when we might be able to swap out PMD size folios > without splitting, this will also be a benefit. > > v2: > - Include all orders up to the PMD size (Christian König) > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> > --- > drivers/gpu/drm/ttm/ttm_pool.c | 27 ++++++++++++++++----------- > 1 file changed, 16 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c > index 0b6e20613d19..939845d853af 100644 > --- a/drivers/gpu/drm/ttm/ttm_pool.c > +++ b/drivers/gpu/drm/ttm/ttm_pool.c > @@ -47,6 +47,9 @@ > > #include "ttm_module.h" > > +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT) > +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1) > + > /** > * struct ttm_pool_dma - Helper object for coherent DMA mappings > * > @@ -65,11 +68,11 @@ module_param(page_pool_size, ulong, 0644); > > static atomic_long_t allocated_pages; > > -static struct ttm_pool_type global_write_combined[MAX_ORDER]; > -static struct ttm_pool_type global_uncached[MAX_ORDER]; > +static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER]; > +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER]; > > -static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER]; > -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER]; > +static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER]; > +static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER]; > > static spinlock_t shrinker_lock; > static struct list_head shrinker_list; > @@ -431,7 +434,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt, > else > gfp_flags |= GFP_HIGHUSER; > > - for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages)); > + for (order = min_t(unsigned int, TTM_MAX_ORDER, __fls(num_pages)); > num_pages; > order = min_t(unsigned int, order, __fls(num_pages))) { > struct ttm_pool_type *pt; > @@ -550,7 +553,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev, > > if (use_dma_alloc) { > for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) > - for (j = 0; j < MAX_ORDER; ++j) > + for (j = 0; j < TTM_DIM_ORDER; ++j) > ttm_pool_type_init(&pool->caching[i].orders[j], > pool, i, j); > } > @@ -570,7 +573,7 @@ void ttm_pool_fini(struct ttm_pool *pool) > > if (pool->use_dma_alloc) { > for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) > - for (j = 0; j < MAX_ORDER; ++j) > + for (j = 0; j < TTM_DIM_ORDER; ++j) > ttm_pool_type_fini(&pool->caching[i].orders[j]); > } > > @@ -624,7 +627,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m) > unsigned int i; > > seq_puts(m, "\t "); > - for (i = 0; i < MAX_ORDER; ++i) > + for (i = 0; i < TTM_DIM_ORDER; ++i) > seq_printf(m, " ---%2u---", i); > seq_puts(m, "\n"); > } > @@ -635,7 +638,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt, > { > unsigned int i; > > - for (i = 0; i < MAX_ORDER; ++i) > + for (i = 0; i < TTM_DIM_ORDER; ++i) > seq_printf(m, " %8u", ttm_pool_type_count(&pt[i])); > seq_puts(m, "\n"); > } > @@ -738,13 +741,15 @@ int ttm_pool_mgr_init(unsigned long num_pages) > { > unsigned int i; > > + BUILD_BUG_ON(TTM_DIM_ORDER > MAX_ORDER); > + > if (!page_pool_size) > page_pool_size = num_pages; > > spin_lock_init(&shrinker_lock); > INIT_LIST_HEAD(&shrinker_list); > > - for (i = 0; i < MAX_ORDER; ++i) { > + for (i = 0; i < TTM_DIM_ORDER; ++i) { > ttm_pool_type_init(&global_write_combined[i], NULL, > ttm_write_combined, i); > ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i); > @@ -777,7 +782,7 @@ void ttm_pool_mgr_fini(void) > { > unsigned int i; > > - for (i = 0; i < MAX_ORDER; ++i) { > + for (i = 0; i < TTM_DIM_ORDER; ++i) { > ttm_pool_type_fini(&global_write_combined[i]); > ttm_pool_type_fini(&global_uncached[i]); >
On 3/8/23 10:15, Christian König wrote: > Am 07.03.23 um 15:46 schrieb Thomas Hellström: >> When swapping out, we will split multi-order pages both in order to >> move them to the swap-cache and to be able to return memory to the >> swap cache as soon as possible on a page-by-page basis. >> Reduce the page max order to the system PMD size, as we can then be >> nicer >> to the system and avoid splitting gigantic pages. > > Mhm, we actually have a todo to start supporting giant pages at some > time. > > Using the folio directly just saves tons of overhead when you don't > need to allocate 2MiG page array any more for each 1GiB you allocate. > > But that probably needs tons of work anyway, so feel free to add my rb > for now. Thanks, I need to fix this anyway for powerpc where it seems PMD_ORDER > MAX_ORDER :/ It might be we'd want to replace the ttm page arrays with scatter-gather tables at some point? I think at least vmwgfx, i915 and xe would benefit from that... /Thomas > > Regards, > Christian. > >> >> Looking forward to when we might be able to swap out PMD size folios >> without splitting, this will also be a benefit. >> >> v2: >> - Include all orders up to the PMD size (Christian König) >> >> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> >> --- >> drivers/gpu/drm/ttm/ttm_pool.c | 27 ++++++++++++++++----------- >> 1 file changed, 16 insertions(+), 11 deletions(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c >> b/drivers/gpu/drm/ttm/ttm_pool.c >> index 0b6e20613d19..939845d853af 100644 >> --- a/drivers/gpu/drm/ttm/ttm_pool.c >> +++ b/drivers/gpu/drm/ttm/ttm_pool.c >> @@ -47,6 +47,9 @@ >> #include "ttm_module.h" >> +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT) >> +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1) >> + >> /** >> * struct ttm_pool_dma - Helper object for coherent DMA mappings >> * >> @@ -65,11 +68,11 @@ module_param(page_pool_size, ulong, 0644); >> static atomic_long_t allocated_pages; >> -static struct ttm_pool_type global_write_combined[MAX_ORDER]; >> -static struct ttm_pool_type global_uncached[MAX_ORDER]; >> +static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER]; >> +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER]; >> -static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER]; >> -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER]; >> +static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER]; >> +static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER]; >> static spinlock_t shrinker_lock; >> static struct list_head shrinker_list; >> @@ -431,7 +434,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct >> ttm_tt *tt, >> else >> gfp_flags |= GFP_HIGHUSER; >> - for (order = min_t(unsigned int, MAX_ORDER - 1, >> __fls(num_pages)); >> + for (order = min_t(unsigned int, TTM_MAX_ORDER, __fls(num_pages)); >> num_pages; >> order = min_t(unsigned int, order, __fls(num_pages))) { >> struct ttm_pool_type *pt; >> @@ -550,7 +553,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct >> device *dev, >> if (use_dma_alloc) { >> for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) >> - for (j = 0; j < MAX_ORDER; ++j) >> + for (j = 0; j < TTM_DIM_ORDER; ++j) >> ttm_pool_type_init(&pool->caching[i].orders[j], >> pool, i, j); >> } >> @@ -570,7 +573,7 @@ void ttm_pool_fini(struct ttm_pool *pool) >> if (pool->use_dma_alloc) { >> for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) >> - for (j = 0; j < MAX_ORDER; ++j) >> + for (j = 0; j < TTM_DIM_ORDER; ++j) >> ttm_pool_type_fini(&pool->caching[i].orders[j]); >> } >> @@ -624,7 +627,7 @@ static void ttm_pool_debugfs_header(struct >> seq_file *m) >> unsigned int i; >> seq_puts(m, "\t "); >> - for (i = 0; i < MAX_ORDER; ++i) >> + for (i = 0; i < TTM_DIM_ORDER; ++i) >> seq_printf(m, " ---%2u---", i); >> seq_puts(m, "\n"); >> } >> @@ -635,7 +638,7 @@ static void ttm_pool_debugfs_orders(struct >> ttm_pool_type *pt, >> { >> unsigned int i; >> - for (i = 0; i < MAX_ORDER; ++i) >> + for (i = 0; i < TTM_DIM_ORDER; ++i) >> seq_printf(m, " %8u", ttm_pool_type_count(&pt[i])); >> seq_puts(m, "\n"); >> } >> @@ -738,13 +741,15 @@ int ttm_pool_mgr_init(unsigned long num_pages) >> { >> unsigned int i; >> + BUILD_BUG_ON(TTM_DIM_ORDER > MAX_ORDER); >> + >> if (!page_pool_size) >> page_pool_size = num_pages; >> spin_lock_init(&shrinker_lock); >> INIT_LIST_HEAD(&shrinker_list); >> - for (i = 0; i < MAX_ORDER; ++i) { >> + for (i = 0; i < TTM_DIM_ORDER; ++i) { >> ttm_pool_type_init(&global_write_combined[i], NULL, >> ttm_write_combined, i); >> ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, >> i); >> @@ -777,7 +782,7 @@ void ttm_pool_mgr_fini(void) >> { >> unsigned int i; >> - for (i = 0; i < MAX_ORDER; ++i) { >> + for (i = 0; i < TTM_DIM_ORDER; ++i) { >> ttm_pool_type_fini(&global_write_combined[i]); >> ttm_pool_type_fini(&global_uncached[i]); >
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index 0b6e20613d19..939845d853af 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -47,6 +47,9 @@ #include "ttm_module.h" +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT) +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1) + /** * struct ttm_pool_dma - Helper object for coherent DMA mappings * @@ -65,11 +68,11 @@ module_param(page_pool_size, ulong, 0644); static atomic_long_t allocated_pages; -static struct ttm_pool_type global_write_combined[MAX_ORDER]; -static struct ttm_pool_type global_uncached[MAX_ORDER]; +static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER]; +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER]; -static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER]; -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER]; +static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER]; +static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER]; static spinlock_t shrinker_lock; static struct list_head shrinker_list; @@ -431,7 +434,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt, else gfp_flags |= GFP_HIGHUSER; - for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages)); + for (order = min_t(unsigned int, TTM_MAX_ORDER, __fls(num_pages)); num_pages; order = min_t(unsigned int, order, __fls(num_pages))) { struct ttm_pool_type *pt; @@ -550,7 +553,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev, if (use_dma_alloc) { for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) - for (j = 0; j < MAX_ORDER; ++j) + for (j = 0; j < TTM_DIM_ORDER; ++j) ttm_pool_type_init(&pool->caching[i].orders[j], pool, i, j); } @@ -570,7 +573,7 @@ void ttm_pool_fini(struct ttm_pool *pool) if (pool->use_dma_alloc) { for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) - for (j = 0; j < MAX_ORDER; ++j) + for (j = 0; j < TTM_DIM_ORDER; ++j) ttm_pool_type_fini(&pool->caching[i].orders[j]); } @@ -624,7 +627,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m) unsigned int i; seq_puts(m, "\t "); - for (i = 0; i < MAX_ORDER; ++i) + for (i = 0; i < TTM_DIM_ORDER; ++i) seq_printf(m, " ---%2u---", i); seq_puts(m, "\n"); } @@ -635,7 +638,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt, { unsigned int i; - for (i = 0; i < MAX_ORDER; ++i) + for (i = 0; i < TTM_DIM_ORDER; ++i) seq_printf(m, " %8u", ttm_pool_type_count(&pt[i])); seq_puts(m, "\n"); } @@ -738,13 +741,15 @@ int ttm_pool_mgr_init(unsigned long num_pages) { unsigned int i; + BUILD_BUG_ON(TTM_DIM_ORDER > MAX_ORDER); + if (!page_pool_size) page_pool_size = num_pages; spin_lock_init(&shrinker_lock); INIT_LIST_HEAD(&shrinker_list); - for (i = 0; i < MAX_ORDER; ++i) { + for (i = 0; i < TTM_DIM_ORDER; ++i) { ttm_pool_type_init(&global_write_combined[i], NULL, ttm_write_combined, i); ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i); @@ -777,7 +782,7 @@ void ttm_pool_mgr_fini(void) { unsigned int i; - for (i = 0; i < MAX_ORDER; ++i) { + for (i = 0; i < TTM_DIM_ORDER; ++i) { ttm_pool_type_fini(&global_write_combined[i]); ttm_pool_type_fini(&global_uncached[i]);
When swapping out, we will split multi-order pages both in order to move them to the swap-cache and to be able to return memory to the swap cache as soon as possible on a page-by-page basis. Reduce the page max order to the system PMD size, as we can then be nicer to the system and avoid splitting gigantic pages. Looking forward to when we might be able to swap out PMD size folios without splitting, this will also be a benefit. v2: - Include all orders up to the PMD size (Christian König) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> --- drivers/gpu/drm/ttm/ttm_pool.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-)