Message ID | 1605936082-3099-5-git-send-email-andrey.grodzovsky@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RFC Support hot device unplug in amdgpu | expand |
Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: > Fixes oops. That file doesn't even exist any more. What oops should this fix? > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > --- > drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c > index b40a467..b0df328 100644 > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c > @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, struct ttm_dma_tt *tt) > dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE, > DMA_BIDIRECTIONAL); > > + tt->dma_address[i] = 0; > + > i += num_pages; > } > ttm_pool_unpopulate(&tt->ttm);
On 11/21/20 9:13 AM, Christian König wrote: > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >> Fixes oops. > > That file doesn't even exist any more. What oops should this fix? Which file ? We set dma_address to NULL in every other place after unmap. This is so that if dma address was already unmapped we skip it next time we enter ttm_unmap_and_unpopulate_pages with same tt for some reason. The oops happens with IOMMU enabled. The device is removed from it's IOMMU group during PCI remove but the BOs are all still alive if user mode client holds reference to drm file. Later when the refernece is droppped and device fini happens i get oops in ttm_unmap_and_unpopulate_pages->dma_unmap_page becaue of IOMMU group structures being gone already. Patch [11/12] drm/amdgpu: Register IOMMU topology notifier per device together with this patch solve the oops. Andrey >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c >> b/drivers/gpu/drm/ttm/ttm_page_alloc.c >> index b40a467..b0df328 100644 >> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c >> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c >> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, >> struct ttm_dma_tt *tt) >> dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE, >> DMA_BIDIRECTIONAL); >> + tt->dma_address[i] = 0; >> + >> i += num_pages; >> } >> ttm_pool_unpopulate(&tt->ttm); > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c70eb602a49497aff3508d88e27ad1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415648381338288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=p8HjrEfydKrspsFCp1v8KCdT6lKr1OEKXdF3%2BSoh4zk%3D&reserved=0 >
Am 23.11.20 um 06:15 schrieb Andrey Grodzovsky: > > On 11/21/20 9:13 AM, Christian König wrote: >> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>> Fixes oops. >> >> That file doesn't even exist any more. What oops should this fix? > > > Which file ? ttm_page_alloc.c I've rewritten the whole page pool from scratch upstream. > We set dma_address to NULL in every other place after unmap. This is > so that > if dma address was already unmapped we skip it next time we enter > ttm_unmap_and_unpopulate_pages > with same tt for some reason. Dave and I already fixed that as well by having a flag preventing double unpopulate. > The oops happens with IOMMU enabled. The device is removed from it's > IOMMU group > during PCI remove but the BOs are all still alive if user mode client > holds reference to drm file. > Later when the refernece is droppped and device fini happens i get > oops in > ttm_unmap_and_unpopulate_pages->dma_unmap_page becaue of IOMMU group > structures being gone already. > Patch [11/12] drm/amdgpu: Register IOMMU topology notifier per device > together with this patch solve the oops. It should be sufficient to unpopulate all BOs now. Maybe you should rebase the patches on drm-misc-next. Christian. > > Andrey > > >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c >>> b/drivers/gpu/drm/ttm/ttm_page_alloc.c >>> index b40a467..b0df328 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c >>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c >>> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct >>> device *dev, struct ttm_dma_tt *tt) >>> dma_unmap_page(dev, tt->dma_address[i], num_pages * >>> PAGE_SIZE, >>> DMA_BIDIRECTIONAL); >>> + tt->dma_address[i] = 0; >>> + >>> i += num_pages; >>> } >>> ttm_pool_unpopulate(&tt->ttm); >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c70eb602a49497aff3508d88e27ad1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415648381338288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=p8HjrEfydKrspsFCp1v8KCdT6lKr1OEKXdF3%2BSoh4zk%3D&reserved=0 >> > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index b40a467..b0df328 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, struct ttm_dma_tt *tt) dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE, DMA_BIDIRECTIONAL); + tt->dma_address[i] = 0; + i += num_pages; } ttm_pool_unpopulate(&tt->ttm);
Fixes oops. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++ 1 file changed, 2 insertions(+)