Message ID | 20181205165621.5805-2-michel@daenzer.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] drm: Only #define DEBUG if CONFIG_DYNAMIC_DEBUG is disabled | expand |
On 12/6/18 12:56 AM, Michel Dänzer wrote: > From: Michel Dänzer <michel.daenzer@amd.com> > > All the output is related, so it should all be printed the same way. > Some of it was using pr_debug, but some of it appeared in dmesg by > default. The caller should handle failure, so there's no need to spam > dmesg with potentially quite a lot of output by default. > > Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Sounds reasonable, but personally prefer to show error when some vital incident happens, e.g. no memory on eviction. Regards, Jerry > --- > drivers/gpu/drm/ttm/ttm_bo.c | 39 ++++++++++++++++++------------------ > 1 file changed, 20 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index d87935bf8e30..5e9b9dd91629 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -77,38 +77,39 @@ static inline int ttm_mem_type_from_place(const struct ttm_place *place, > return 0; > } > > -static void ttm_mem_type_debug(struct ttm_bo_device *bdev, int mem_type) > +static void ttm_mem_type_debug(struct ttm_bo_device *bdev, struct drm_printer *p, > + int mem_type) > { > struct ttm_mem_type_manager *man = &bdev->man[mem_type]; > - struct drm_printer p = drm_debug_printer(TTM_PFX); > > - pr_err(" has_type: %d\n", man->has_type); > - pr_err(" use_type: %d\n", man->use_type); > - pr_err(" flags: 0x%08X\n", man->flags); > - pr_err(" gpu_offset: 0x%08llX\n", man->gpu_offset); > - pr_err(" size: %llu\n", man->size); > - pr_err(" available_caching: 0x%08X\n", man->available_caching); > - pr_err(" default_caching: 0x%08X\n", man->default_caching); > + drm_printf(p, " has_type: %d\n", man->has_type); > + drm_printf(p, " use_type: %d\n", man->use_type); > + drm_printf(p, " flags: 0x%08X\n", man->flags); > + drm_printf(p, " gpu_offset: 0x%08llX\n", man->gpu_offset); > + drm_printf(p, " size: %llu\n", man->size); > + drm_printf(p, " available_caching: 0x%08X\n", man->available_caching); > + drm_printf(p, " default_caching: 0x%08X\n", man->default_caching); > if (mem_type != TTM_PL_SYSTEM) > - (*man->func->debug)(man, &p); > + (*man->func->debug)(man, p); > } > > static void ttm_bo_mem_space_debug(struct ttm_buffer_object *bo, > struct ttm_placement *placement) > { > + struct drm_printer p = drm_debug_printer(TTM_PFX); > int i, ret, mem_type; > > - pr_err("No space for %p (%lu pages, %luK, %luM)\n", > - bo, bo->mem.num_pages, bo->mem.size >> 10, > - bo->mem.size >> 20); > + drm_printf(&p, "No space for %p (%lu pages, %luK, %luM)\n", > + bo, bo->mem.num_pages, bo->mem.size >> 10, > + bo->mem.size >> 20); > for (i = 0; i < placement->num_placement; i++) { > ret = ttm_mem_type_from_place(&placement->placement[i], > &mem_type); > if (ret) > return; > - pr_err(" placement[%d]=0x%08X (%d)\n", > - i, placement->placement[i].flags, mem_type); > - ttm_mem_type_debug(bo->bdev, mem_type); > + drm_printf(&p, " placement[%d]=0x%08X (%d)\n", > + i, placement->placement[i].flags, mem_type); > + ttm_mem_type_debug(bo->bdev, &p, mem_type); > } > } > > @@ -728,8 +729,8 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, > ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx); > if (ret) { > if (ret != -ERESTARTSYS) { > - pr_err("Failed to find memory space for buffer 0x%p eviction\n", > - bo); > + pr_debug("Failed to find memory space for buffer 0x%p eviction\n", > + bo); > ttm_bo_mem_space_debug(bo, &placement); > } > goto out; > @@ -738,7 +739,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, > ret = ttm_bo_handle_move_mem(bo, &evict_mem, true, ctx); > if (unlikely(ret)) { > if (ret != -ERESTARTSYS) > - pr_err("Buffer eviction failed\n"); > + pr_debug("Buffer eviction failed\n"); > ttm_bo_mem_put(bo, &evict_mem); > goto out; > }
On 2018-12-06 3:43 a.m., Zhang, Jerry(Junwei) wrote: > On 12/6/18 12:56 AM, Michel Dänzer wrote: >> From: Michel Dänzer <michel.daenzer@amd.com> >> >> All the output is related, so it should all be printed the same way. >> Some of it was using pr_debug, but some of it appeared in dmesg by >> default. The caller should handle failure, so there's no need to spam >> dmesg with potentially quite a lot of output by default. >> >> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> > Sounds reasonable, but personally prefer to show error when some > vital incident happens, e.g. no memory on eviction. The amdgpu driver still prints these in that case: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission! That's plenty as far as I'm concerned. :)
Am 06.12.18 um 10:09 schrieb Michel Dänzer: > On 2018-12-06 3:43 a.m., Zhang, Jerry(Junwei) wrote: >> On 12/6/18 12:56 AM, Michel Dänzer wrote: >>> From: Michel Dänzer <michel.daenzer@amd.com> >>> >>> All the output is related, so it should all be printed the same way. >>> Some of it was using pr_debug, but some of it appeared in dmesg by >>> default. The caller should handle failure, so there's no need to spam >>> dmesg with potentially quite a lot of output by default. >>> >>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> >> Sounds reasonable, but personally prefer to show error when some >> vital incident happens, e.g. no memory on eviction. > The amdgpu driver still prints these in that case: > > [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. > [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission! > > That's plenty as far as I'm concerned. :) Yeah, but in this case I would rather make the amdgpu messages debug level and leave the TTM meassages on error level. Christian.
On 2018-12-06 10:33 a.m., Koenig, Christian wrote: > Am 06.12.18 um 10:09 schrieb Michel Dänzer: >> On 2018-12-06 3:43 a.m., Zhang, Jerry(Junwei) wrote: >>> On 12/6/18 12:56 AM, Michel Dänzer wrote: >>>> From: Michel Dänzer <michel.daenzer@amd.com> >>>> >>>> All the output is related, so it should all be printed the same way. >>>> Some of it was using pr_debug, but some of it appeared in dmesg by >>>> default. The caller should handle failure, so there's no need to spam >>>> dmesg with potentially quite a lot of output by default. >>>> >>>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> >>> Sounds reasonable, but personally prefer to show error when some >>> vital incident happens, e.g. no memory on eviction. >> The amdgpu driver still prints these in that case: >> >> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. >> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission! >> >> That's plenty as far as I'm concerned. :) > > Yeah, but in this case I would rather make the amdgpu messages debug > level and leave the TTM meassages on error level. That makes no sense to me. The amdgpu messages have some value for normal users / bug reports, indicating that something isn't going quite as planned. The TTM messages are orders of magnitude longer, and are basically noise for a normal user. Seems like a no-brainer to me which of these should be visible by default.
On 12/6/18 5:33 PM, Koenig, Christian wrote: > Am 06.12.18 um 10:09 schrieb Michel Dänzer: >> On 2018-12-06 3:43 a.m., Zhang, Jerry(Junwei) wrote: >>> On 12/6/18 12:56 AM, Michel Dänzer wrote: >>>> From: Michel Dänzer <michel.daenzer@amd.com> >>>> >>>> All the output is related, so it should all be printed the same way. >>>> Some of it was using pr_debug, but some of it appeared in dmesg by >>>> default. The caller should handle failure, so there's no need to spam >>>> dmesg with potentially quite a lot of output by default. >>>> >>>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> >>> Sounds reasonable, but personally prefer to show error when some >>> vital incident happens, e.g. no memory on eviction. >> The amdgpu driver still prints these in that case: >> >> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. >> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission! That's from cs submit, perhaps it may come from other places by ttm_bo_evict_mm(). Is that right? Christian. Regards, Jerry >> >> That's plenty as far as I'm concerned. :) > Yeah, but in this case I would rather make the amdgpu messages debug > level and leave the TTM meassages on error level. > > Christian.
Am 06.12.18 um 10:39 schrieb Zhang, Jerry(Junwei): > On 12/6/18 5:33 PM, Koenig, Christian wrote: >> Am 06.12.18 um 10:09 schrieb Michel Dänzer: >>> On 2018-12-06 3:43 a.m., Zhang, Jerry(Junwei) wrote: >>>> On 12/6/18 12:56 AM, Michel Dänzer wrote: >>>>> From: Michel Dänzer <michel.daenzer@amd.com> >>>>> >>>>> All the output is related, so it should all be printed the same way. >>>>> Some of it was using pr_debug, but some of it appeared in dmesg by >>>>> default. The caller should handle failure, so there's no need to spam >>>>> dmesg with potentially quite a lot of output by default. >>>>> >>>>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> >>>> Sounds reasonable, but personally prefer to show error when some >>>> vital incident happens, e.g. no memory on eviction. >>> The amdgpu driver still prints these in that case: >>> >>> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* >>> amdgpu_cs_list_validate(validated) failed. >>> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for >>> command submission! > > That's from cs submit, perhaps it may come from other places by > ttm_bo_evict_mm(). > Is that right? Christian. Yeah, exactly my thinking as well. When we silence the TTM messages we might miss those cases. Additional to that other drivers using TTM might not have those messages either. If TTM is to noisy we should use ratelimit and/or reduce the number and size of the warning messages. A simple "Warning, I ran out of memory during eviction!" should do. Regards, Christian. > > Regards, > Jerry >>> >>> That's plenty as far as I'm concerned. :) >> Yeah, but in this case I would rather make the amdgpu messages debug >> level and leave the TTM meassages on error level. >> >> Christian. > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On 2018-12-06 10:38 a.m., Michel Dänzer wrote: > On 2018-12-06 10:33 a.m., Koenig, Christian wrote: >> Am 06.12.18 um 10:09 schrieb Michel Dänzer: >>> On 2018-12-06 3:43 a.m., Zhang, Jerry(Junwei) wrote: >>>> On 12/6/18 12:56 AM, Michel Dänzer wrote: >>>>> From: Michel Dänzer <michel.daenzer@amd.com> >>>>> >>>>> All the output is related, so it should all be printed the same way. >>>>> Some of it was using pr_debug, but some of it appeared in dmesg by >>>>> default. The caller should handle failure, so there's no need to spam >>>>> dmesg with potentially quite a lot of output by default. >>>>> >>>>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> >>>> Sounds reasonable, but personally prefer to show error when some >>>> vital incident happens, e.g. no memory on eviction. >>> The amdgpu driver still prints these in that case: >>> >>> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. >>> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission! >>> >>> That's plenty as far as I'm concerned. :) >> >> Yeah, but in this case I would rather make the amdgpu messages debug >> level and leave the TTM meassages on error level. > > That makes no sense to me. > > The amdgpu messages have some value for normal users / bug reports, > indicating that something isn't going quite as planned. > > The TTM messages are orders of magnitude longer, and are basically noise > for a normal user. > > Seems like a no-brainer to me which of these should be visible by default. Moreover, not every case producing the driver output also produces the TTM output, so it could make it difficult to realize that there's a memory pressure situation.
On 2018-12-06 10:49 a.m., Christian König wrote: > Am 06.12.18 um 10:39 schrieb Zhang, Jerry(Junwei): >> On 12/6/18 5:33 PM, Koenig, Christian wrote: >>> Am 06.12.18 um 10:09 schrieb Michel Dänzer: >>>> On 2018-12-06 3:43 a.m., Zhang, Jerry(Junwei) wrote: >>>>> On 12/6/18 12:56 AM, Michel Dänzer wrote: >>>>>> From: Michel Dänzer <michel.daenzer@amd.com> >>>>>> >>>>>> All the output is related, so it should all be printed the same way. >>>>>> Some of it was using pr_debug, but some of it appeared in dmesg by >>>>>> default. The caller should handle failure, so there's no need to spam >>>>>> dmesg with potentially quite a lot of output by default. >>>>>> >>>>>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> >>>>> Sounds reasonable, but personally prefer to show error when some >>>>> vital incident happens, e.g. no memory on eviction. >>>> The amdgpu driver still prints these in that case: >>>> >>>> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* >>>> amdgpu_cs_list_validate(validated) failed. >>>> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for >>>> command submission! >> >> That's from cs submit, perhaps it may come from other places by >> ttm_bo_evict_mm(). >> Is that right? Christian. > > Yeah, exactly my thinking as well. When we silence the TTM messages we > might miss those cases. > > Additional to that other drivers using TTM might not have those messages > either. > > If TTM is to noisy we should use ratelimit and/or reduce the number and > size of the warning messages. > > A simple "Warning, I ran out of memory during eviction!" should do. Just dropping the last hunk of this patch should do then. I can do that.
On Thu, 2018-12-06 at 17:39 +0800, Zhang, Jerry(Junwei) wrote: > On 12/6/18 5:33 PM, Koenig, Christian wrote: > > Am 06.12.18 um 10:09 schrieb Michel Dänzer: > > > On 2018-12-06 3:43 a.m., Zhang, Jerry(Junwei) wrote: > > > > On 12/6/18 12:56 AM, Michel Dänzer wrote: > > > > > From: Michel Dänzer <michel.daenzer@amd.com> > > > > > > > > > > All the output is related, so it should all be printed the same way. > > > > > Some of it was using pr_debug, but some of it appeared in dmesg by > > > > > default. The caller should handle failure, so there's no need to spam > > > > > dmesg with potentially quite a lot of output by default. > > > > > > > > > > Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> > > > > Sounds reasonable, but personally prefer to show error when some > > > > vital incident happens, e.g. no memory on eviction. > > > The amdgpu driver still prints these in that case: > > > > > > [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. > > > [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission! Aren't dump_stack()s already done on all these allocation failures? I don't notice any use of __GFP_NOWARN on generic allocations in drm.
On 2018-12-06 5:46 p.m., Joe Perches wrote: > On Thu, 2018-12-06 at 17:39 +0800, Zhang, Jerry(Junwei) wrote: >> On 12/6/18 5:33 PM, Koenig, Christian wrote: >>> Am 06.12.18 um 10:09 schrieb Michel Dänzer: >>>> On 2018-12-06 3:43 a.m., Zhang, Jerry(Junwei) wrote: >>>>> On 12/6/18 12:56 AM, Michel Dänzer wrote: >>>>>> From: Michel Dänzer <michel.daenzer@amd.com> >>>>>> >>>>>> All the output is related, so it should all be printed the same way. >>>>>> Some of it was using pr_debug, but some of it appeared in dmesg by >>>>>> default. The caller should handle failure, so there's no need to spam >>>>>> dmesg with potentially quite a lot of output by default. >>>>>> >>>>>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> >>>>> Sounds reasonable, but personally prefer to show error when some >>>>> vital incident happens, e.g. no memory on eviction. >>>> The amdgpu driver still prints these in that case: >>>> >>>> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. >>>> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission! > > Aren't dump_stack()s already done on all these allocation failures? > I don't notice any use of __GFP_NOWARN on generic allocations in drm. Most of the time, these messages are due to being unable to allocate a TTM BO or move it where it needs to go, not due to the kernel failing to allocate memory in general.
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index d87935bf8e30..5e9b9dd91629 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -77,38 +77,39 @@ static inline int ttm_mem_type_from_place(const struct ttm_place *place, return 0; } -static void ttm_mem_type_debug(struct ttm_bo_device *bdev, int mem_type) +static void ttm_mem_type_debug(struct ttm_bo_device *bdev, struct drm_printer *p, + int mem_type) { struct ttm_mem_type_manager *man = &bdev->man[mem_type]; - struct drm_printer p = drm_debug_printer(TTM_PFX); - pr_err(" has_type: %d\n", man->has_type); - pr_err(" use_type: %d\n", man->use_type); - pr_err(" flags: 0x%08X\n", man->flags); - pr_err(" gpu_offset: 0x%08llX\n", man->gpu_offset); - pr_err(" size: %llu\n", man->size); - pr_err(" available_caching: 0x%08X\n", man->available_caching); - pr_err(" default_caching: 0x%08X\n", man->default_caching); + drm_printf(p, " has_type: %d\n", man->has_type); + drm_printf(p, " use_type: %d\n", man->use_type); + drm_printf(p, " flags: 0x%08X\n", man->flags); + drm_printf(p, " gpu_offset: 0x%08llX\n", man->gpu_offset); + drm_printf(p, " size: %llu\n", man->size); + drm_printf(p, " available_caching: 0x%08X\n", man->available_caching); + drm_printf(p, " default_caching: 0x%08X\n", man->default_caching); if (mem_type != TTM_PL_SYSTEM) - (*man->func->debug)(man, &p); + (*man->func->debug)(man, p); } static void ttm_bo_mem_space_debug(struct ttm_buffer_object *bo, struct ttm_placement *placement) { + struct drm_printer p = drm_debug_printer(TTM_PFX); int i, ret, mem_type; - pr_err("No space for %p (%lu pages, %luK, %luM)\n", - bo, bo->mem.num_pages, bo->mem.size >> 10, - bo->mem.size >> 20); + drm_printf(&p, "No space for %p (%lu pages, %luK, %luM)\n", + bo, bo->mem.num_pages, bo->mem.size >> 10, + bo->mem.size >> 20); for (i = 0; i < placement->num_placement; i++) { ret = ttm_mem_type_from_place(&placement->placement[i], &mem_type); if (ret) return; - pr_err(" placement[%d]=0x%08X (%d)\n", - i, placement->placement[i].flags, mem_type); - ttm_mem_type_debug(bo->bdev, mem_type); + drm_printf(&p, " placement[%d]=0x%08X (%d)\n", + i, placement->placement[i].flags, mem_type); + ttm_mem_type_debug(bo->bdev, &p, mem_type); } } @@ -728,8 +729,8 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx); if (ret) { if (ret != -ERESTARTSYS) { - pr_err("Failed to find memory space for buffer 0x%p eviction\n", - bo); + pr_debug("Failed to find memory space for buffer 0x%p eviction\n", + bo); ttm_bo_mem_space_debug(bo, &placement); } goto out; @@ -738,7 +739,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, ret = ttm_bo_handle_move_mem(bo, &evict_mem, true, ctx); if (unlikely(ret)) { if (ret != -ERESTARTSYS) - pr_err("Buffer eviction failed\n"); + pr_debug("Buffer eviction failed\n"); ttm_bo_mem_put(bo, &evict_mem); goto out; }