[19/25] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code
diff mbox series

Message ID 20200707201229.472834-20-daniel.vetter@ffwll.ch
State New
Headers show
Series
  • dma-fence annotations, round 3
Related show

Commit Message

Daniel Vetter July 7, 2020, 8:12 p.m. UTC
My dma-fence lockdep annotations caught an inversion because we
allocate memory where we really shouldn't:

	kmem_cache_alloc+0x2b/0x6d0
	amdgpu_fence_emit+0x30/0x330 [amdgpu]
	amdgpu_ib_schedule+0x306/0x550 [amdgpu]
	amdgpu_job_run+0x10f/0x260 [amdgpu]
	drm_sched_main+0x1b9/0x490 [gpu_sched]
	kthread+0x12e/0x150

Trouble right now is that lockdep only validates against GFP_FS, which
would be good enough for shrinkers. But for mmu_notifiers we actually
need !GFP_ATOMIC, since they can be called from any page laundering,
even if GFP_NOFS or GFP_NOIO are set.

I guess we should improve the lockdep annotations for
fs_reclaim_acquire/release.

Ofc real fix is to properly preallocate this fence and stuff it into
the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
the way.

v2: Two more allocations in scheduler paths.

Frist one:

	__kmalloc+0x58/0x720
	amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
	drm_sched_main+0xf9/0x490 [gpu_sched]

Second one:

	kmem_cache_alloc+0x2b/0x6d0
	amdgpu_sync_fence+0x7e/0x110 [amdgpu]
	amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
	drm_sched_main+0xf9/0x490 [gpu_sched]

Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-rdma@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

Comments

Daniel Vetter July 14, 2020, 10:49 a.m. UTC | #1
On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote:
> My dma-fence lockdep annotations caught an inversion because we
> allocate memory where we really shouldn't:
> 
> 	kmem_cache_alloc+0x2b/0x6d0
> 	amdgpu_fence_emit+0x30/0x330 [amdgpu]
> 	amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> 	amdgpu_job_run+0x10f/0x260 [amdgpu]
> 	drm_sched_main+0x1b9/0x490 [gpu_sched]
> 	kthread+0x12e/0x150
> 
> Trouble right now is that lockdep only validates against GFP_FS, which
> would be good enough for shrinkers. But for mmu_notifiers we actually
> need !GFP_ATOMIC, since they can be called from any page laundering,
> even if GFP_NOFS or GFP_NOIO are set.
> 
> I guess we should improve the lockdep annotations for
> fs_reclaim_acquire/release.
> 
> Ofc real fix is to properly preallocate this fence and stuff it into
> the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> the way.
> 
> v2: Two more allocations in scheduler paths.
> 
> Frist one:
> 
> 	__kmalloc+0x58/0x720
> 	amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> 	drm_sched_main+0xf9/0x490 [gpu_sched]
> 
> Second one:
> 
> 	kmem_cache_alloc+0x2b/0x6d0
> 	amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> 	amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> 	drm_sched_main+0xf9/0x490 [gpu_sched]
> 
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> Cc: linux-rdma@vger.kernel.org
> Cc: amd-gfx@lists.freedesktop.org
> Cc: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Has anyone from amd side started looking into how to fix this properly?

I looked a bit into fixing this with mempool, and the big guarantee we
need is that
- there's a hard upper limit on how many allocations we minimally need to
  guarantee forward progress. And the entire vmid allocation and
  amdgpu_sync_fence stuff kinda makes me question that's a valid
  assumption.

- mempool_free must be called without any locks in the way which are held
  while we call mempool_alloc. Otherwise we again have a nice deadlock
  with no forward progress. I tried auditing that, but got lost in amdgpu
  and scheduler code. Some lockdep annotations for mempool.c might help,
  but they're not going to catch everything. Plus it would be again manual
  annotations because this is yet another cross-release issue. So not sure
  that helps at all.

iow, not sure what to do here. Ideas?

Cheers, Daniel

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 8d84975885cd..a089a827fdfe 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
>  	uint32_t seq;
>  	int r;
>  
> -	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> +	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
>  	if (fence == NULL)
>  		return -ENOMEM;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> index 267fa45ddb66..a333ca2d4ddd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
>  	if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
>  		return amdgpu_sync_fence(sync, ring->vmid_wait);
>  
> -	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> +	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
>  	if (!fences)
>  		return -ENOMEM;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index 8ea6c49529e7..af22b526cec9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -160,7 +160,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f)
>  	if (amdgpu_sync_add_later(sync, f))
>  		return 0;
>  
> -	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> +	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
>  	if (!e)
>  		return -ENOMEM;
>  
> -- 
> 2.27.0
>
Christian König July 14, 2020, 11:40 a.m. UTC | #2
Am 14.07.20 um 12:49 schrieb Daniel Vetter:
> On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote:
>> My dma-fence lockdep annotations caught an inversion because we
>> allocate memory where we really shouldn't:
>>
>> 	kmem_cache_alloc+0x2b/0x6d0
>> 	amdgpu_fence_emit+0x30/0x330 [amdgpu]
>> 	amdgpu_ib_schedule+0x306/0x550 [amdgpu]
>> 	amdgpu_job_run+0x10f/0x260 [amdgpu]
>> 	drm_sched_main+0x1b9/0x490 [gpu_sched]
>> 	kthread+0x12e/0x150
>>
>> Trouble right now is that lockdep only validates against GFP_FS, which
>> would be good enough for shrinkers. But for mmu_notifiers we actually
>> need !GFP_ATOMIC, since they can be called from any page laundering,
>> even if GFP_NOFS or GFP_NOIO are set.
>>
>> I guess we should improve the lockdep annotations for
>> fs_reclaim_acquire/release.
>>
>> Ofc real fix is to properly preallocate this fence and stuff it into
>> the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
>> the way.
>>
>> v2: Two more allocations in scheduler paths.
>>
>> Frist one:
>>
>> 	__kmalloc+0x58/0x720
>> 	amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
>> 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
>> 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
>> 	drm_sched_main+0xf9/0x490 [gpu_sched]
>>
>> Second one:
>>
>> 	kmem_cache_alloc+0x2b/0x6d0
>> 	amdgpu_sync_fence+0x7e/0x110 [amdgpu]
>> 	amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
>> 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
>> 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
>> 	drm_sched_main+0xf9/0x490 [gpu_sched]
>>
>> Cc: linux-media@vger.kernel.org
>> Cc: linaro-mm-sig@lists.linaro.org
>> Cc: linux-rdma@vger.kernel.org
>> Cc: amd-gfx@lists.freedesktop.org
>> Cc: intel-gfx@lists.freedesktop.org
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Has anyone from amd side started looking into how to fix this properly?

Yeah I checked both and neither are any real problem.

> I looked a bit into fixing this with mempool, and the big guarantee we
> need is that
> - there's a hard upper limit on how many allocations we minimally need to
>    guarantee forward progress. And the entire vmid allocation and
>    amdgpu_sync_fence stuff kinda makes me question that's a valid
>    assumption.

We do have hard upper limits for those.

The VMID allocation could as well just return the fence instead of 
putting it into the sync object IIRC. So that just needs some cleanup 
and can avoid the allocation entirely.

The hardware fence is limited by the number of submissions we can have 
concurrently on the ring buffers, so also not a problem at all.

Regards,
Christian.

>
> - mempool_free must be called without any locks in the way which are held
>    while we call mempool_alloc. Otherwise we again have a nice deadlock
>    with no forward progress. I tried auditing that, but got lost in amdgpu
>    and scheduler code. Some lockdep annotations for mempool.c might help,
>    but they're not going to catch everything. Plus it would be again manual
>    annotations because this is yet another cross-release issue. So not sure
>    that helps at all.
>
> iow, not sure what to do here. Ideas?
>
> Cheers, Daniel
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
>>   3 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 8d84975885cd..a089a827fdfe 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
>>   	uint32_t seq;
>>   	int r;
>>   
>> -	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
>> +	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
>>   	if (fence == NULL)
>>   		return -ENOMEM;
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>> index 267fa45ddb66..a333ca2d4ddd 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>> @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
>>   	if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
>>   		return amdgpu_sync_fence(sync, ring->vmid_wait);
>>   
>> -	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
>> +	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
>>   	if (!fences)
>>   		return -ENOMEM;
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> index 8ea6c49529e7..af22b526cec9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> @@ -160,7 +160,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f)
>>   	if (amdgpu_sync_add_later(sync, f))
>>   		return 0;
>>   
>> -	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
>> +	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
>>   	if (!e)
>>   		return -ENOMEM;
>>   
>> -- 
>> 2.27.0
>>
Daniel Vetter July 14, 2020, 2:31 p.m. UTC | #3
On Tue, Jul 14, 2020 at 01:40:11PM +0200, Christian König wrote:
> Am 14.07.20 um 12:49 schrieb Daniel Vetter:
> > On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote:
> > > My dma-fence lockdep annotations caught an inversion because we
> > > allocate memory where we really shouldn't:
> > > 
> > > 	kmem_cache_alloc+0x2b/0x6d0
> > > 	amdgpu_fence_emit+0x30/0x330 [amdgpu]
> > > 	amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> > > 	amdgpu_job_run+0x10f/0x260 [amdgpu]
> > > 	drm_sched_main+0x1b9/0x490 [gpu_sched]
> > > 	kthread+0x12e/0x150
> > > 
> > > Trouble right now is that lockdep only validates against GFP_FS, which
> > > would be good enough for shrinkers. But for mmu_notifiers we actually
> > > need !GFP_ATOMIC, since they can be called from any page laundering,
> > > even if GFP_NOFS or GFP_NOIO are set.
> > > 
> > > I guess we should improve the lockdep annotations for
> > > fs_reclaim_acquire/release.
> > > 
> > > Ofc real fix is to properly preallocate this fence and stuff it into
> > > the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> > > the way.
> > > 
> > > v2: Two more allocations in scheduler paths.
> > > 
> > > Frist one:
> > > 
> > > 	__kmalloc+0x58/0x720
> > > 	amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> > > 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> > > 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> > > 	drm_sched_main+0xf9/0x490 [gpu_sched]
> > > 
> > > Second one:
> > > 
> > > 	kmem_cache_alloc+0x2b/0x6d0
> > > 	amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> > > 	amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> > > 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> > > 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> > > 	drm_sched_main+0xf9/0x490 [gpu_sched]
> > > 
> > > Cc: linux-media@vger.kernel.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> > > Cc: linux-rdma@vger.kernel.org
> > > Cc: amd-gfx@lists.freedesktop.org
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Has anyone from amd side started looking into how to fix this properly?
> 
> Yeah I checked both and neither are any real problem.

I'm confused ... do you mean "no real problem fixing them" or "not
actually a real problem"?

> > I looked a bit into fixing this with mempool, and the big guarantee we
> > need is that
> > - there's a hard upper limit on how many allocations we minimally need to
> >    guarantee forward progress. And the entire vmid allocation and
> >    amdgpu_sync_fence stuff kinda makes me question that's a valid
> >    assumption.
> 
> We do have hard upper limits for those.
> 
> The VMID allocation could as well just return the fence instead of putting
> it into the sync object IIRC. So that just needs some cleanup and can avoid
> the allocation entirely.

Yeah embedding should be simplest solution of all.

> The hardware fence is limited by the number of submissions we can have
> concurrently on the ring buffers, so also not a problem at all.

Ok that sounds good. Wrt releasing the memory again, is that also done
without any of the allocation-side locks held? I've seen some vmid manager
somewhere ...
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > - mempool_free must be called without any locks in the way which are held
> >    while we call mempool_alloc. Otherwise we again have a nice deadlock
> >    with no forward progress. I tried auditing that, but got lost in amdgpu
> >    and scheduler code. Some lockdep annotations for mempool.c might help,
> >    but they're not going to catch everything. Plus it would be again manual
> >    annotations because this is yet another cross-release issue. So not sure
> >    that helps at all.
> > 
> > iow, not sure what to do here. Ideas?
> > 
> > Cheers, Daniel
> > 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
> > >   3 files changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > index 8d84975885cd..a089a827fdfe 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> > >   	uint32_t seq;
> > >   	int r;
> > > -	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> > > +	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
> > >   	if (fence == NULL)
> > >   		return -ENOMEM;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > index 267fa45ddb66..a333ca2d4ddd 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
> > >   	if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
> > >   		return amdgpu_sync_fence(sync, ring->vmid_wait);
> > > -	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> > > +	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
> > >   	if (!fences)
> > >   		return -ENOMEM;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > index 8ea6c49529e7..af22b526cec9 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > @@ -160,7 +160,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f)
> > >   	if (amdgpu_sync_add_later(sync, f))
> > >   		return 0;
> > > -	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> > > +	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
> > >   	if (!e)
> > >   		return -ENOMEM;
> > > -- 
> > > 2.27.0
> > > 
>
Christian König July 15, 2020, 9:17 a.m. UTC | #4
Am 14.07.20 um 16:31 schrieb Daniel Vetter:
> On Tue, Jul 14, 2020 at 01:40:11PM +0200, Christian König wrote:
>> Am 14.07.20 um 12:49 schrieb Daniel Vetter:
>>> On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote:
>>>> My dma-fence lockdep annotations caught an inversion because we
>>>> allocate memory where we really shouldn't:
>>>>
>>>> 	kmem_cache_alloc+0x2b/0x6d0
>>>> 	amdgpu_fence_emit+0x30/0x330 [amdgpu]
>>>> 	amdgpu_ib_schedule+0x306/0x550 [amdgpu]
>>>> 	amdgpu_job_run+0x10f/0x260 [amdgpu]
>>>> 	drm_sched_main+0x1b9/0x490 [gpu_sched]
>>>> 	kthread+0x12e/0x150
>>>>
>>>> Trouble right now is that lockdep only validates against GFP_FS, which
>>>> would be good enough for shrinkers. But for mmu_notifiers we actually
>>>> need !GFP_ATOMIC, since they can be called from any page laundering,
>>>> even if GFP_NOFS or GFP_NOIO are set.
>>>>
>>>> I guess we should improve the lockdep annotations for
>>>> fs_reclaim_acquire/release.
>>>>
>>>> Ofc real fix is to properly preallocate this fence and stuff it into
>>>> the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
>>>> the way.
>>>>
>>>> v2: Two more allocations in scheduler paths.
>>>>
>>>> Frist one:
>>>>
>>>> 	__kmalloc+0x58/0x720
>>>> 	amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
>>>> 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
>>>> 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
>>>> 	drm_sched_main+0xf9/0x490 [gpu_sched]
>>>>
>>>> Second one:
>>>>
>>>> 	kmem_cache_alloc+0x2b/0x6d0
>>>> 	amdgpu_sync_fence+0x7e/0x110 [amdgpu]
>>>> 	amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
>>>> 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
>>>> 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
>>>> 	drm_sched_main+0xf9/0x490 [gpu_sched]
>>>>
>>>> Cc: linux-media@vger.kernel.org
>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>> Cc: linux-rdma@vger.kernel.org
>>>> Cc: amd-gfx@lists.freedesktop.org
>>>> Cc: intel-gfx@lists.freedesktop.org
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> Has anyone from amd side started looking into how to fix this properly?
>> Yeah I checked both and neither are any real problem.
> I'm confused ... do you mean "no real problem fixing them" or "not
> actually a real problem"?

Both, at least the VMID stuff is trivial to avoid.

And the fence allocation is extremely unlikely. E.g. when we allocate a 
new one we previously most likely just freed one already.

>
>>> I looked a bit into fixing this with mempool, and the big guarantee we
>>> need is that
>>> - there's a hard upper limit on how many allocations we minimally need to
>>>     guarantee forward progress. And the entire vmid allocation and
>>>     amdgpu_sync_fence stuff kinda makes me question that's a valid
>>>     assumption.
>> We do have hard upper limits for those.
>>
>> The VMID allocation could as well just return the fence instead of putting
>> it into the sync object IIRC. So that just needs some cleanup and can avoid
>> the allocation entirely.
> Yeah embedding should be simplest solution of all.
>
>> The hardware fence is limited by the number of submissions we can have
>> concurrently on the ring buffers, so also not a problem at all.
> Ok that sounds good. Wrt releasing the memory again, is that also done
> without any of the allocation-side locks held? I've seen some vmid manager
> somewhere ...

Well that's the issue. We can't guarantee that for the hardware fence 
memory since it could be that we hold another reference during debugging 
IIRC.

Still looking if and how we could fix this. But as I said this problem 
is so extremely unlikely.

Christian.

> -Daniel
>
>> Regards,
>> Christian.
>>
>>> - mempool_free must be called without any locks in the way which are held
>>>     while we call mempool_alloc. Otherwise we again have a nice deadlock
>>>     with no forward progress. I tried auditing that, but got lost in amdgpu
>>>     and scheduler code. Some lockdep annotations for mempool.c might help,
>>>     but they're not going to catch everything. Plus it would be again manual
>>>     annotations because this is yet another cross-release issue. So not sure
>>>     that helps at all.
>>>
>>> iow, not sure what to do here. Ideas?
>>>
>>> Cheers, Daniel
>>>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
>>>>    3 files changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index 8d84975885cd..a089a827fdfe 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
>>>>    	uint32_t seq;
>>>>    	int r;
>>>> -	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
>>>> +	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
>>>>    	if (fence == NULL)
>>>>    		return -ENOMEM;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>>>> index 267fa45ddb66..a333ca2d4ddd 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>>>> @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
>>>>    	if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
>>>>    		return amdgpu_sync_fence(sync, ring->vmid_wait);
>>>> -	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
>>>> +	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
>>>>    	if (!fences)
>>>>    		return -ENOMEM;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> index 8ea6c49529e7..af22b526cec9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> @@ -160,7 +160,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f)
>>>>    	if (amdgpu_sync_add_later(sync, f))
>>>>    		return 0;
>>>> -	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
>>>> +	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
>>>>    	if (!e)
>>>>    		return -ENOMEM;
>>>> -- 
>>>> 2.27.0
>>>>
Daniel Vetter July 15, 2020, 11:53 a.m. UTC | #5
On Wed, Jul 15, 2020 at 11:17 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 14.07.20 um 16:31 schrieb Daniel Vetter:
> > On Tue, Jul 14, 2020 at 01:40:11PM +0200, Christian König wrote:
> >> Am 14.07.20 um 12:49 schrieb Daniel Vetter:
> >>> On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote:
> >>>> My dma-fence lockdep annotations caught an inversion because we
> >>>> allocate memory where we really shouldn't:
> >>>>
> >>>>    kmem_cache_alloc+0x2b/0x6d0
> >>>>    amdgpu_fence_emit+0x30/0x330 [amdgpu]
> >>>>    amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> >>>>    amdgpu_job_run+0x10f/0x260 [amdgpu]
> >>>>    drm_sched_main+0x1b9/0x490 [gpu_sched]
> >>>>    kthread+0x12e/0x150
> >>>>
> >>>> Trouble right now is that lockdep only validates against GFP_FS, which
> >>>> would be good enough for shrinkers. But for mmu_notifiers we actually
> >>>> need !GFP_ATOMIC, since they can be called from any page laundering,
> >>>> even if GFP_NOFS or GFP_NOIO are set.
> >>>>
> >>>> I guess we should improve the lockdep annotations for
> >>>> fs_reclaim_acquire/release.
> >>>>
> >>>> Ofc real fix is to properly preallocate this fence and stuff it into
> >>>> the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> >>>> the way.
> >>>>
> >>>> v2: Two more allocations in scheduler paths.
> >>>>
> >>>> Frist one:
> >>>>
> >>>>    __kmalloc+0x58/0x720
> >>>>    amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> >>>>    amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> >>>>    drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> >>>>    drm_sched_main+0xf9/0x490 [gpu_sched]
> >>>>
> >>>> Second one:
> >>>>
> >>>>    kmem_cache_alloc+0x2b/0x6d0
> >>>>    amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> >>>>    amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> >>>>    amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> >>>>    drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> >>>>    drm_sched_main+0xf9/0x490 [gpu_sched]
> >>>>
> >>>> Cc: linux-media@vger.kernel.org
> >>>> Cc: linaro-mm-sig@lists.linaro.org
> >>>> Cc: linux-rdma@vger.kernel.org
> >>>> Cc: amd-gfx@lists.freedesktop.org
> >>>> Cc: intel-gfx@lists.freedesktop.org
> >>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>> Cc: Christian König <christian.koenig@amd.com>
> >>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>> Has anyone from amd side started looking into how to fix this properly?
> >> Yeah I checked both and neither are any real problem.
> > I'm confused ... do you mean "no real problem fixing them" or "not
> > actually a real problem"?
>
> Both, at least the VMID stuff is trivial to avoid.
>
> And the fence allocation is extremely unlikely. E.g. when we allocate a
> new one we previously most likely just freed one already.

Yeah I think debugging we can avoid, just stop debugging if things get
hung up like that. So mempool for the hw fences should be perfectly
fine.

The vmid stuff I don't really understand enough, but the hw fence
stuff I think I grok, plus other scheduler users need that too from a
quick look. I might be tackling that one (maybe put the mempool
outright into drm_scheduler code as a helper), except if you have
patches already in the works. vmid I'll leave to you guys :-)

-Daniel

>
> >
> >>> I looked a bit into fixing this with mempool, and the big guarantee we
> >>> need is that
> >>> - there's a hard upper limit on how many allocations we minimally need to
> >>>     guarantee forward progress. And the entire vmid allocation and
> >>>     amdgpu_sync_fence stuff kinda makes me question that's a valid
> >>>     assumption.
> >> We do have hard upper limits for those.
> >>
> >> The VMID allocation could as well just return the fence instead of putting
> >> it into the sync object IIRC. So that just needs some cleanup and can avoid
> >> the allocation entirely.
> > Yeah embedding should be simplest solution of all.
> >
> >> The hardware fence is limited by the number of submissions we can have
> >> concurrently on the ring buffers, so also not a problem at all.
> > Ok that sounds good. Wrt releasing the memory again, is that also done
> > without any of the allocation-side locks held? I've seen some vmid manager
> > somewhere ...
>
> Well that's the issue. We can't guarantee that for the hardware fence
> memory since it could be that we hold another reference during debugging
> IIRC.
>
> Still looking if and how we could fix this. But as I said this problem
> is so extremely unlikely.
>
> Christian.
>
> > -Daniel
> >
> >> Regards,
> >> Christian.
> >>
> >>> - mempool_free must be called without any locks in the way which are held
> >>>     while we call mempool_alloc. Otherwise we again have a nice deadlock
> >>>     with no forward progress. I tried auditing that, but got lost in amdgpu
> >>>     and scheduler code. Some lockdep annotations for mempool.c might help,
> >>>     but they're not going to catch everything. Plus it would be again manual
> >>>     annotations because this is yet another cross-release issue. So not sure
> >>>     that helps at all.
> >>>
> >>> iow, not sure what to do here. Ideas?
> >>>
> >>> Cheers, Daniel
> >>>
> >>>> ---
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
> >>>>    3 files changed, 3 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> index 8d84975885cd..a089a827fdfe 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> >>>>            uint32_t seq;
> >>>>            int r;
> >>>> -  fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> >>>> +  fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
> >>>>            if (fence == NULL)
> >>>>                    return -ENOMEM;
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> >>>> index 267fa45ddb66..a333ca2d4ddd 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> >>>> @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
> >>>>            if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
> >>>>                    return amdgpu_sync_fence(sync, ring->vmid_wait);
> >>>> -  fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> >>>> +  fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
> >>>>            if (!fences)
> >>>>                    return -ENOMEM;
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>>> index 8ea6c49529e7..af22b526cec9 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>>> @@ -160,7 +160,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f)
> >>>>            if (amdgpu_sync_add_later(sync, f))
> >>>>                    return 0;
> >>>> -  e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> >>>> +  e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
> >>>>            if (!e)
> >>>>                    return -ENOMEM;
> >>>> --
> >>>> 2.27.0
> >>>>
>

Patch
diff mbox series

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 8d84975885cd..a089a827fdfe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -143,7 +143,7 @@  int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
 	uint32_t seq;
 	int r;
 
-	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
+	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
 	if (fence == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index 267fa45ddb66..a333ca2d4ddd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -208,7 +208,7 @@  static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
 	if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
 		return amdgpu_sync_fence(sync, ring->vmid_wait);
 
-	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
+	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
 	if (!fences)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 8ea6c49529e7..af22b526cec9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -160,7 +160,7 @@  int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f)
 	if (amdgpu_sync_add_later(sync, f))
 		return 0;
 
-	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
+	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
 	if (!e)
 		return -ENOMEM;