diff mbox series

[v3,04/12] drm/ttm: Set dma addr to null after freee

Message ID 1605936082-3099-5-git-send-email-andrey.grodzovsky@amd.com (mailing list archive)
State New, archived
Headers show
Series RFC Support hot device unplug in amdgpu | expand

Commit Message

Andrey Grodzovsky Nov. 21, 2020, 5:21 a.m. UTC
Fixes oops.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Christian König Nov. 21, 2020, 2:13 p.m. UTC | #1
Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> Fixes oops.

That file doesn't even exist any more. What oops should this fix?

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index b40a467..b0df328 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, struct ttm_dma_tt *tt)
>   		dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE,
>   			       DMA_BIDIRECTIONAL);
>   
> +		tt->dma_address[i] = 0;
> +
>   		i += num_pages;
>   	}
>   	ttm_pool_unpopulate(&tt->ttm);
Andrey Grodzovsky Nov. 23, 2020, 5:15 a.m. UTC | #2
On 11/21/20 9:13 AM, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>> Fixes oops.
>
> That file doesn't even exist any more. What oops should this fix?


Which file ?
We set dma_address to NULL in every other place after unmap. This is so that
if dma address was already unmapped we skip it next time we enter 
ttm_unmap_and_unpopulate_pages
with same tt for some reason.
The oops happens with IOMMU enabled. The device is removed from it's IOMMU group
during PCI remove but the BOs are all still alive if user mode client holds 
reference to drm file.
Later when the refernece is droppped and device fini happens i get oops in
ttm_unmap_and_unpopulate_pages->dma_unmap_page becaue of IOMMU group structures 
being gone already.
Patch  [11/12] drm/amdgpu: Register IOMMU topology notifier per device together 
with this patch solve the oops.

Andrey


>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
>> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> index b40a467..b0df328 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, 
>> struct ttm_dma_tt *tt)
>>           dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE,
>>                      DMA_BIDIRECTIONAL);
>>   +        tt->dma_address[i] = 0;
>> +
>>           i += num_pages;
>>       }
>>       ttm_pool_unpopulate(&tt->ttm);
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c70eb602a49497aff3508d88e27ad1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415648381338288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p8HjrEfydKrspsFCp1v8KCdT6lKr1OEKXdF3%2BSoh4zk%3D&amp;reserved=0 
>
Christian König Nov. 23, 2020, 8:04 a.m. UTC | #3
Am 23.11.20 um 06:15 schrieb Andrey Grodzovsky:
>
> On 11/21/20 9:13 AM, Christian König wrote:
>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>> Fixes oops.
>>
>> That file doesn't even exist any more. What oops should this fix?
>
>
> Which file ?

ttm_page_alloc.c

I've rewritten the whole page pool from scratch upstream.

> We set dma_address to NULL in every other place after unmap. This is 
> so that
> if dma address was already unmapped we skip it next time we enter 
> ttm_unmap_and_unpopulate_pages
> with same tt for some reason.

Dave and I already fixed that as well by having a flag preventing double 
unpopulate.

> The oops happens with IOMMU enabled. The device is removed from it's 
> IOMMU group
> during PCI remove but the BOs are all still alive if user mode client 
> holds reference to drm file.
> Later when the refernece is droppped and device fini happens i get 
> oops in
> ttm_unmap_and_unpopulate_pages->dma_unmap_page becaue of IOMMU group 
> structures being gone already.
> Patch  [11/12] drm/amdgpu: Register IOMMU topology notifier per device 
> together with this patch solve the oops.

It should be sufficient to unpopulate all BOs now.

Maybe you should rebase the patches on drm-misc-next.

Christian.

>
> Andrey
>
>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
>>> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> index b40a467..b0df328 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct 
>>> device *dev, struct ttm_dma_tt *tt)
>>>           dma_unmap_page(dev, tt->dma_address[i], num_pages * 
>>> PAGE_SIZE,
>>>                      DMA_BIDIRECTIONAL);
>>>   +        tt->dma_address[i] = 0;
>>> +
>>>           i += num_pages;
>>>       }
>>>       ttm_pool_unpopulate(&tt->ttm);
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c70eb602a49497aff3508d88e27ad1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415648381338288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p8HjrEfydKrspsFCp1v8KCdT6lKr1OEKXdF3%2BSoh4zk%3D&amp;reserved=0 
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index b40a467..b0df328 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -1160,6 +1160,8 @@  void ttm_unmap_and_unpopulate_pages(struct device *dev, struct ttm_dma_tt *tt)
 		dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE,
 			       DMA_BIDIRECTIONAL);
 
+		tt->dma_address[i] = 0;
+
 		i += num_pages;
 	}
 	ttm_pool_unpopulate(&tt->ttm);