diff mbox series

[RFC,10/17] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code

Message ID 20200512085944.222637-11-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show
Series dma-fence lockdep annotations | expand

Commit Message

Daniel Vetter May 12, 2020, 8:59 a.m. UTC
My dma-fence lockdep annotations caught an inversion because we
allocate memory where we really shouldn't:

	kmem_cache_alloc+0x2b/0x6d0
	amdgpu_fence_emit+0x30/0x330 [amdgpu]
	amdgpu_ib_schedule+0x306/0x550 [amdgpu]
	amdgpu_job_run+0x10f/0x260 [amdgpu]
	drm_sched_main+0x1b9/0x490 [gpu_sched]
	kthread+0x12e/0x150

Trouble right now is that lockdep only validates against GFP_FS, which
would be good enough for shrinkers. But for mmu_notifiers we actually
need !GFP_ATOMIC, since they can be called from any page laundering,
even if GFP_NOFS or GFP_NOIO are set.

I guess we should improve the lockdep annotations for
fs_reclaim_acquire/release.

Ofc real fix is to properly preallocate this fence and stuff it into
the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
the way.

v2: Two more allocations in scheduler paths.

Frist one:

	__kmalloc+0x58/0x720
	amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
	drm_sched_main+0xf9/0x490 [gpu_sched]

Second one:

	kmem_cache_alloc+0x2b/0x6d0
	amdgpu_sync_fence+0x7e/0x110 [amdgpu]
	amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
	drm_sched_main+0xf9/0x490 [gpu_sched]

Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-rdma@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

Comments

Christian König May 12, 2020, 3:56 p.m. UTC | #1
Hui what? Of hand that doesn't looks correct to me.

Why the heck should this be an atomic context? If that's correct 
allocating memory is the least of the problems we have.

Regards,
Christian.

Am 12.05.20 um 10:59 schrieb Daniel Vetter:
> My dma-fence lockdep annotations caught an inversion because we
> allocate memory where we really shouldn't:
>
> 	kmem_cache_alloc+0x2b/0x6d0
> 	amdgpu_fence_emit+0x30/0x330 [amdgpu]
> 	amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> 	amdgpu_job_run+0x10f/0x260 [amdgpu]
> 	drm_sched_main+0x1b9/0x490 [gpu_sched]
> 	kthread+0x12e/0x150
>
> Trouble right now is that lockdep only validates against GFP_FS, which
> would be good enough for shrinkers. But for mmu_notifiers we actually
> need !GFP_ATOMIC, since they can be called from any page laundering,
> even if GFP_NOFS or GFP_NOIO are set.
>
> I guess we should improve the lockdep annotations for
> fs_reclaim_acquire/release.
>
> Ofc real fix is to properly preallocate this fence and stuff it into
> the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> the way.
>
> v2: Two more allocations in scheduler paths.
>
> Frist one:
>
> 	__kmalloc+0x58/0x720
> 	amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> 	drm_sched_main+0xf9/0x490 [gpu_sched]
>
> Second one:
>
> 	kmem_cache_alloc+0x2b/0x6d0
> 	amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> 	amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> 	amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> 	drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> 	drm_sched_main+0xf9/0x490 [gpu_sched]
>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> Cc: linux-rdma@vger.kernel.org
> Cc: amd-gfx@lists.freedesktop.org
> Cc: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
>   3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index d878fe7fee51..055b47241bb1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
>   	uint32_t seq;
>   	int r;
>   
> -	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> +	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
>   	if (fence == NULL)
>   		return -ENOMEM;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> index fe92dcd94d4a..fdcd6659f5ad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
>   	if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
>   		return amdgpu_sync_fence(sync, ring->vmid_wait, false);
>   
> -	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> +	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
>   	if (!fences)
>   		return -ENOMEM;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index b87ca171986a..330476cc0c86 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
>   	if (amdgpu_sync_add_later(sync, f, explicit))
>   		return 0;
>   
> -	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> +	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
>   	if (!e)
>   		return -ENOMEM;
>
Daniel Vetter May 12, 2020, 4:20 p.m. UTC | #2
On Tue, May 12, 2020 at 5:56 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Hui what? Of hand that doesn't looks correct to me.

It's not GFP_ATOMIC, it's just that GFP_ATOMIC is the only shotgun we
have to avoid direct reclaim. And direct reclaim might need to call
into your mmu notifier, which might need to wait on a fence, which is
never going to happen because your scheduler is stuck.

Note that all the explanations for the deadlocks and stuff I'm trying
to hunt here are in the other patches, the driver ones are more
informational, so I left these here rather bare-bones to shut up
lockdep so I can get through the entire driver and all major areas
(scheduler, reset, modeset code).

Now you can do something like GFP_NOFS, but the only reasons that
works is because the direct reclaim annotations
(fs_reclaim_acquire/release) only validates against __GFP_FS, and not
against any of the other flags. We should probably add some lockdep
annotations so that __GFP_RECLAIM is annotated against the
__mmu_notifier_invalidate_range_start_map lockdep map I've recently
added for mmu notifiers. End result (assuming I'm not mixing anything
up here, this is all rather tricky stuff): GFP_ATOMIC is the only kind
of memory allocation you can do.

> Why the heck should this be an atomic context? If that's correct
> allocating memory is the least of the problems we have.

It's not about atomic, it's !__GFP_RECLAIM. Which more or less is
GFP_ATOMIC. Correct fix is probably GFP_ATOMIC + a mempool for the
scheduler fixes so that if you can't allocate them for some reason,
you at least know that your scheduler should eventually retire retire
some of them, which you can then pick up from the mempool to guarantee
forward progress.

But I really didn't dig into details of the code, this was just a quick hack.

So sleeping and taking all kinds of locks (but not all, e.g.
dma_resv_lock and drm_modeset_lock are no-go) is still totally ok.
Just think

#define GFP_NO_DIRECT_RECLAIM GFP_ATOMIC

Cheers, Daniel

>
> Regards,
> Christian.
>
> Am 12.05.20 um 10:59 schrieb Daniel Vetter:
> > My dma-fence lockdep annotations caught an inversion because we
> > allocate memory where we really shouldn't:
> >
> >       kmem_cache_alloc+0x2b/0x6d0
> >       amdgpu_fence_emit+0x30/0x330 [amdgpu]
> >       amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> >       amdgpu_job_run+0x10f/0x260 [amdgpu]
> >       drm_sched_main+0x1b9/0x490 [gpu_sched]
> >       kthread+0x12e/0x150
> >
> > Trouble right now is that lockdep only validates against GFP_FS, which
> > would be good enough for shrinkers. But for mmu_notifiers we actually
> > need !GFP_ATOMIC, since they can be called from any page laundering,
> > even if GFP_NOFS or GFP_NOIO are set.
> >
> > I guess we should improve the lockdep annotations for
> > fs_reclaim_acquire/release.
> >
> > Ofc real fix is to properly preallocate this fence and stuff it into
> > the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> > the way.
> >
> > v2: Two more allocations in scheduler paths.
> >
> > Frist one:
> >
> >       __kmalloc+0x58/0x720
> >       amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> >       amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> >       drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> >       drm_sched_main+0xf9/0x490 [gpu_sched]
> >
> > Second one:
> >
> >       kmem_cache_alloc+0x2b/0x6d0
> >       amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> >       amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> >       amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> >       drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> >       drm_sched_main+0xf9/0x490 [gpu_sched]
> >
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Cc: linux-rdma@vger.kernel.org
> > Cc: amd-gfx@lists.freedesktop.org
> > Cc: intel-gfx@lists.freedesktop.org
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
> >   3 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index d878fe7fee51..055b47241bb1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> >       uint32_t seq;
> >       int r;
> >
> > -     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> > +     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
> >       if (fence == NULL)
> >               return -ENOMEM;
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > index fe92dcd94d4a..fdcd6659f5ad 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
> >       if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
> >               return amdgpu_sync_fence(sync, ring->vmid_wait, false);
> >
> > -     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> > +     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
> >       if (!fences)
> >               return -ENOMEM;
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > index b87ca171986a..330476cc0c86 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
> >       if (amdgpu_sync_add_later(sync, f, explicit))
> >               return 0;
> >
> > -     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> > +     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
> >       if (!e)
> >               return -ENOMEM;
> >
>
Daniel Vetter May 12, 2020, 4:27 p.m. UTC | #3
On Tue, May 12, 2020 at 6:20 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Tue, May 12, 2020 at 5:56 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Hui what? Of hand that doesn't looks correct to me.
>
> It's not GFP_ATOMIC, it's just that GFP_ATOMIC is the only shotgun we
> have to avoid direct reclaim. And direct reclaim might need to call
> into your mmu notifier, which might need to wait on a fence, which is
> never going to happen because your scheduler is stuck.
>
> Note that all the explanations for the deadlocks and stuff I'm trying
> to hunt here are in the other patches, the driver ones are more
> informational, so I left these here rather bare-bones to shut up
> lockdep so I can get through the entire driver and all major areas
> (scheduler, reset, modeset code).
>
> Now you can do something like GFP_NOFS, but the only reasons that
> works is because the direct reclaim annotations
> (fs_reclaim_acquire/release) only validates against __GFP_FS, and not
> against any of the other flags. We should probably add some lockdep
> annotations so that __GFP_RECLAIM is annotated against the
> __mmu_notifier_invalidate_range_start_map lockdep map I've recently
> added for mmu notifiers. End result (assuming I'm not mixing anything
> up here, this is all rather tricky stuff): GFP_ATOMIC is the only kind
> of memory allocation you can do.
>
> > Why the heck should this be an atomic context? If that's correct
> > allocating memory is the least of the problems we have.
>
> It's not about atomic, it's !__GFP_RECLAIM. Which more or less is
> GFP_ATOMIC. Correct fix is probably GFP_ATOMIC + a mempool for the
> scheduler fixes so that if you can't allocate them for some reason,
> you at least know that your scheduler should eventually retire retire
> some of them, which you can then pick up from the mempool to guarantee
> forward progress.
>
> But I really didn't dig into details of the code, this was just a quick hack.
>
> So sleeping and taking all kinds of locks (but not all, e.g.
> dma_resv_lock and drm_modeset_lock are no-go) is still totally ok.
> Just think
>
> #define GFP_NO_DIRECT_RECLAIM GFP_ATOMIC

Maybe slightly different take that's easier to understand: You've
already made the observation that anything holding adev->notifier_lock
isn't allowed to allocate memory (well GFP_ATOMIC is ok, like here).

Only thing I'm adding is that the situation is a lot worse. Plus the
lockdep annotations to help us catch these issues.
-Daniel

> Cheers, Daniel
>
> >
> > Regards,
> > Christian.
> >
> > Am 12.05.20 um 10:59 schrieb Daniel Vetter:
> > > My dma-fence lockdep annotations caught an inversion because we
> > > allocate memory where we really shouldn't:
> > >
> > >       kmem_cache_alloc+0x2b/0x6d0
> > >       amdgpu_fence_emit+0x30/0x330 [amdgpu]
> > >       amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> > >       amdgpu_job_run+0x10f/0x260 [amdgpu]
> > >       drm_sched_main+0x1b9/0x490 [gpu_sched]
> > >       kthread+0x12e/0x150
> > >
> > > Trouble right now is that lockdep only validates against GFP_FS, which
> > > would be good enough for shrinkers. But for mmu_notifiers we actually
> > > need !GFP_ATOMIC, since they can be called from any page laundering,
> > > even if GFP_NOFS or GFP_NOIO are set.
> > >
> > > I guess we should improve the lockdep annotations for
> > > fs_reclaim_acquire/release.
> > >
> > > Ofc real fix is to properly preallocate this fence and stuff it into
> > > the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> > > the way.
> > >
> > > v2: Two more allocations in scheduler paths.
> > >
> > > Frist one:
> > >
> > >       __kmalloc+0x58/0x720
> > >       amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> > >       amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> > >       drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> > >       drm_sched_main+0xf9/0x490 [gpu_sched]
> > >
> > > Second one:
> > >
> > >       kmem_cache_alloc+0x2b/0x6d0
> > >       amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> > >       amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> > >       amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> > >       drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> > >       drm_sched_main+0xf9/0x490 [gpu_sched]
> > >
> > > Cc: linux-media@vger.kernel.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> > > Cc: linux-rdma@vger.kernel.org
> > > Cc: amd-gfx@lists.freedesktop.org
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
> > >   3 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > index d878fe7fee51..055b47241bb1 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> > >       uint32_t seq;
> > >       int r;
> > >
> > > -     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> > > +     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
> > >       if (fence == NULL)
> > >               return -ENOMEM;
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > index fe92dcd94d4a..fdcd6659f5ad 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
> > >       if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
> > >               return amdgpu_sync_fence(sync, ring->vmid_wait, false);
> > >
> > > -     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> > > +     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
> > >       if (!fences)
> > >               return -ENOMEM;
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > index b87ca171986a..330476cc0c86 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
> > >       if (amdgpu_sync_add_later(sync, f, explicit))
> > >               return 0;
> > >
> > > -     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> > > +     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
> > >       if (!e)
> > >               return -ENOMEM;
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Christian König May 12, 2020, 5:31 p.m. UTC | #4
Ah!

So we can't allocate memory while scheduling anything because it could 
be that memory reclaim is waiting for the scheduler to finish pushing 
things to the hardware, right?

Indeed a nice problem, haven't noticed that one.

Christian.

Am 12.05.20 um 18:27 schrieb Daniel Vetter:
> On Tue, May 12, 2020 at 6:20 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>> On Tue, May 12, 2020 at 5:56 PM Christian König
>> <christian.koenig@amd.com> wrote:
>>> Hui what? Of hand that doesn't looks correct to me.
>> It's not GFP_ATOMIC, it's just that GFP_ATOMIC is the only shotgun we
>> have to avoid direct reclaim. And direct reclaim might need to call
>> into your mmu notifier, which might need to wait on a fence, which is
>> never going to happen because your scheduler is stuck.
>>
>> Note that all the explanations for the deadlocks and stuff I'm trying
>> to hunt here are in the other patches, the driver ones are more
>> informational, so I left these here rather bare-bones to shut up
>> lockdep so I can get through the entire driver and all major areas
>> (scheduler, reset, modeset code).
>>
>> Now you can do something like GFP_NOFS, but the only reasons that
>> works is because the direct reclaim annotations
>> (fs_reclaim_acquire/release) only validates against __GFP_FS, and not
>> against any of the other flags. We should probably add some lockdep
>> annotations so that __GFP_RECLAIM is annotated against the
>> __mmu_notifier_invalidate_range_start_map lockdep map I've recently
>> added for mmu notifiers. End result (assuming I'm not mixing anything
>> up here, this is all rather tricky stuff): GFP_ATOMIC is the only kind
>> of memory allocation you can do.
>>
>>> Why the heck should this be an atomic context? If that's correct
>>> allocating memory is the least of the problems we have.
>> It's not about atomic, it's !__GFP_RECLAIM. Which more or less is
>> GFP_ATOMIC. Correct fix is probably GFP_ATOMIC + a mempool for the
>> scheduler fixes so that if you can't allocate them for some reason,
>> you at least know that your scheduler should eventually retire retire
>> some of them, which you can then pick up from the mempool to guarantee
>> forward progress.
>>
>> But I really didn't dig into details of the code, this was just a quick hack.
>>
>> So sleeping and taking all kinds of locks (but not all, e.g.
>> dma_resv_lock and drm_modeset_lock are no-go) is still totally ok.
>> Just think
>>
>> #define GFP_NO_DIRECT_RECLAIM GFP_ATOMIC
> Maybe slightly different take that's easier to understand: You've
> already made the observation that anything holding adev->notifier_lock
> isn't allowed to allocate memory (well GFP_ATOMIC is ok, like here).
>
> Only thing I'm adding is that the situation is a lot worse. Plus the
> lockdep annotations to help us catch these issues.
> -Daniel
>
>> Cheers, Daniel
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 12.05.20 um 10:59 schrieb Daniel Vetter:
>>>> My dma-fence lockdep annotations caught an inversion because we
>>>> allocate memory where we really shouldn't:
>>>>
>>>>        kmem_cache_alloc+0x2b/0x6d0
>>>>        amdgpu_fence_emit+0x30/0x330 [amdgpu]
>>>>        amdgpu_ib_schedule+0x306/0x550 [amdgpu]
>>>>        amdgpu_job_run+0x10f/0x260 [amdgpu]
>>>>        drm_sched_main+0x1b9/0x490 [gpu_sched]
>>>>        kthread+0x12e/0x150
>>>>
>>>> Trouble right now is that lockdep only validates against GFP_FS, which
>>>> would be good enough for shrinkers. But for mmu_notifiers we actually
>>>> need !GFP_ATOMIC, since they can be called from any page laundering,
>>>> even if GFP_NOFS or GFP_NOIO are set.
>>>>
>>>> I guess we should improve the lockdep annotations for
>>>> fs_reclaim_acquire/release.
>>>>
>>>> Ofc real fix is to properly preallocate this fence and stuff it into
>>>> the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
>>>> the way.
>>>>
>>>> v2: Two more allocations in scheduler paths.
>>>>
>>>> Frist one:
>>>>
>>>>        __kmalloc+0x58/0x720
>>>>        amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
>>>>        amdgpu_job_dependency+0xf9/0x120 [amdgpu]
>>>>        drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
>>>>        drm_sched_main+0xf9/0x490 [gpu_sched]
>>>>
>>>> Second one:
>>>>
>>>>        kmem_cache_alloc+0x2b/0x6d0
>>>>        amdgpu_sync_fence+0x7e/0x110 [amdgpu]
>>>>        amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
>>>>        amdgpu_job_dependency+0xf9/0x120 [amdgpu]
>>>>        drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
>>>>        drm_sched_main+0xf9/0x490 [gpu_sched]
>>>>
>>>> Cc: linux-media@vger.kernel.org
>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>> Cc: linux-rdma@vger.kernel.org
>>>> Cc: amd-gfx@lists.freedesktop.org
>>>> Cc: intel-gfx@lists.freedesktop.org
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
>>>>    3 files changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index d878fe7fee51..055b47241bb1 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
>>>>        uint32_t seq;
>>>>        int r;
>>>>
>>>> -     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
>>>> +     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
>>>>        if (fence == NULL)
>>>>                return -ENOMEM;
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>>>> index fe92dcd94d4a..fdcd6659f5ad 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>>>> @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
>>>>        if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
>>>>                return amdgpu_sync_fence(sync, ring->vmid_wait, false);
>>>>
>>>> -     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
>>>> +     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
>>>>        if (!fences)
>>>>                return -ENOMEM;
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> index b87ca171986a..330476cc0c86 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
>>>>        if (amdgpu_sync_add_later(sync, f, explicit))
>>>>                return 0;
>>>>
>>>> -     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
>>>> +     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
>>>>        if (!e)
>>>>                return -ENOMEM;
>>>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C38b330b8aab946f388e908d7f691553b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637248976369551581&amp;sdata=6rrCvEYVug95QXc3yYLbQ8ZN4wyYelfUUGWiyitVpuc%3D&amp;reserved=0
>
>
Daniel Vetter May 12, 2020, 6:34 p.m. UTC | #5
On Tue, May 12, 2020 at 7:31 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Ah!
>
> So we can't allocate memory while scheduling anything because it could
> be that memory reclaim is waiting for the scheduler to finish pushing
> things to the hardware, right?

Yup, that's my understanding. But like with all things kernel, I'm not
sure, so I tried to come up with some annotations. One of them is the
memory allocation stuff, but it also did find the modeset/dc related
issues in tdr/gpu recovery, so I think overall it's fairly sound. But
the memory side definitely needs more discussion (like really the
entire thing I'm proposing here, hence rfc).

My rough hope here is that first we figure out the exact current
semantics and nail them down in lockdep annotations and kerneldoc. And
then we need to figure out how to step-by-step land this, since lots
of drivers will have smaller and bigger issues all over.

I tried to backsearch our CI for the memory allocation issue
specifically, but unfortunately we're not retaining a whole lot of the
full logs because it's so much. But the more general issue of taking
locks somewhere in the path towards completing a fence (tail end of CS
ioctl, scheduler, tdr, modeset code since that generates fences too
for at least android, ...) that are also held while waiting for said
fences to complete is fairly common. I've seen those way too often,
and up to now lockdep is simply silent about them.

> Indeed a nice problem, haven't noticed that one.

It's pretty glorious indeed :-)

Cheers, Daniel

>
> Christian.
>
> Am 12.05.20 um 18:27 schrieb Daniel Vetter:
> > On Tue, May 12, 2020 at 6:20 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >> On Tue, May 12, 2020 at 5:56 PM Christian König
> >> <christian.koenig@amd.com> wrote:
> >>> Hui what? Of hand that doesn't looks correct to me.
> >> It's not GFP_ATOMIC, it's just that GFP_ATOMIC is the only shotgun we
> >> have to avoid direct reclaim. And direct reclaim might need to call
> >> into your mmu notifier, which might need to wait on a fence, which is
> >> never going to happen because your scheduler is stuck.
> >>
> >> Note that all the explanations for the deadlocks and stuff I'm trying
> >> to hunt here are in the other patches, the driver ones are more
> >> informational, so I left these here rather bare-bones to shut up
> >> lockdep so I can get through the entire driver and all major areas
> >> (scheduler, reset, modeset code).
> >>
> >> Now you can do something like GFP_NOFS, but the only reasons that
> >> works is because the direct reclaim annotations
> >> (fs_reclaim_acquire/release) only validates against __GFP_FS, and not
> >> against any of the other flags. We should probably add some lockdep
> >> annotations so that __GFP_RECLAIM is annotated against the
> >> __mmu_notifier_invalidate_range_start_map lockdep map I've recently
> >> added for mmu notifiers. End result (assuming I'm not mixing anything
> >> up here, this is all rather tricky stuff): GFP_ATOMIC is the only kind
> >> of memory allocation you can do.
> >>
> >>> Why the heck should this be an atomic context? If that's correct
> >>> allocating memory is the least of the problems we have.
> >> It's not about atomic, it's !__GFP_RECLAIM. Which more or less is
> >> GFP_ATOMIC. Correct fix is probably GFP_ATOMIC + a mempool for the
> >> scheduler fixes so that if you can't allocate them for some reason,
> >> you at least know that your scheduler should eventually retire retire
> >> some of them, which you can then pick up from the mempool to guarantee
> >> forward progress.
> >>
> >> But I really didn't dig into details of the code, this was just a quick hack.
> >>
> >> So sleeping and taking all kinds of locks (but not all, e.g.
> >> dma_resv_lock and drm_modeset_lock are no-go) is still totally ok.
> >> Just think
> >>
> >> #define GFP_NO_DIRECT_RECLAIM GFP_ATOMIC
> > Maybe slightly different take that's easier to understand: You've
> > already made the observation that anything holding adev->notifier_lock
> > isn't allowed to allocate memory (well GFP_ATOMIC is ok, like here).
> >
> > Only thing I'm adding is that the situation is a lot worse. Plus the
> > lockdep annotations to help us catch these issues.
> > -Daniel
> >
> >> Cheers, Daniel
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>> Am 12.05.20 um 10:59 schrieb Daniel Vetter:
> >>>> My dma-fence lockdep annotations caught an inversion because we
> >>>> allocate memory where we really shouldn't:
> >>>>
> >>>>        kmem_cache_alloc+0x2b/0x6d0
> >>>>        amdgpu_fence_emit+0x30/0x330 [amdgpu]
> >>>>        amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> >>>>        amdgpu_job_run+0x10f/0x260 [amdgpu]
> >>>>        drm_sched_main+0x1b9/0x490 [gpu_sched]
> >>>>        kthread+0x12e/0x150
> >>>>
> >>>> Trouble right now is that lockdep only validates against GFP_FS, which
> >>>> would be good enough for shrinkers. But for mmu_notifiers we actually
> >>>> need !GFP_ATOMIC, since they can be called from any page laundering,
> >>>> even if GFP_NOFS or GFP_NOIO are set.
> >>>>
> >>>> I guess we should improve the lockdep annotations for
> >>>> fs_reclaim_acquire/release.
> >>>>
> >>>> Ofc real fix is to properly preallocate this fence and stuff it into
> >>>> the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> >>>> the way.
> >>>>
> >>>> v2: Two more allocations in scheduler paths.
> >>>>
> >>>> Frist one:
> >>>>
> >>>>        __kmalloc+0x58/0x720
> >>>>        amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> >>>>        amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> >>>>        drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> >>>>        drm_sched_main+0xf9/0x490 [gpu_sched]
> >>>>
> >>>> Second one:
> >>>>
> >>>>        kmem_cache_alloc+0x2b/0x6d0
> >>>>        amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> >>>>        amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> >>>>        amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> >>>>        drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> >>>>        drm_sched_main+0xf9/0x490 [gpu_sched]
> >>>>
> >>>> Cc: linux-media@vger.kernel.org
> >>>> Cc: linaro-mm-sig@lists.linaro.org
> >>>> Cc: linux-rdma@vger.kernel.org
> >>>> Cc: amd-gfx@lists.freedesktop.org
> >>>> Cc: intel-gfx@lists.freedesktop.org
> >>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>> Cc: Christian König <christian.koenig@amd.com>
> >>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
> >>>>    3 files changed, 3 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> index d878fe7fee51..055b47241bb1 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> >>>>        uint32_t seq;
> >>>>        int r;
> >>>>
> >>>> -     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> >>>> +     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
> >>>>        if (fence == NULL)
> >>>>                return -ENOMEM;
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> >>>> index fe92dcd94d4a..fdcd6659f5ad 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> >>>> @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
> >>>>        if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
> >>>>                return amdgpu_sync_fence(sync, ring->vmid_wait, false);
> >>>>
> >>>> -     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> >>>> +     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
> >>>>        if (!fences)
> >>>>                return -ENOMEM;
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>>> index b87ca171986a..330476cc0c86 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>>> @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
> >>>>        if (amdgpu_sync_add_later(sync, f, explicit))
> >>>>                return 0;
> >>>>
> >>>> -     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> >>>> +     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
> >>>>        if (!e)
> >>>>                return -ENOMEM;
> >>>>
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> +41 (0) 79 365 57 48 - https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C38b330b8aab946f388e908d7f691553b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637248976369551581&amp;sdata=6rrCvEYVug95QXc3yYLbQ8ZN4wyYelfUUGWiyitVpuc%3D&amp;reserved=0
> >
> >
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d878fe7fee51..055b47241bb1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -143,7 +143,7 @@  int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
 	uint32_t seq;
 	int r;
 
-	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
+	fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
 	if (fence == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index fe92dcd94d4a..fdcd6659f5ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -208,7 +208,7 @@  static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
 	if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
 		return amdgpu_sync_fence(sync, ring->vmid_wait, false);
 
-	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
+	fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
 	if (!fences)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index b87ca171986a..330476cc0c86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -168,7 +168,7 @@  int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
 	if (amdgpu_sync_add_later(sync, f, explicit))
 		return 0;
 
-	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
+	e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
 	if (!e)
 		return -ENOMEM;