Message ID | 1605936082-3099-6-git-send-email-andrey.grodzovsky@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RFC Support hot device unplug in amdgpu | expand |
On 11/25/20 5:42 AM, Christian König wrote: > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >> It's needed to drop iommu backed pages on device unplug >> before device's IOMMU group is released. > > It would be cleaner if we could do the whole handling in TTM. I also need to > double check what you are doing with this function. > > Christian. Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to see how i use it. I don't see why this should go into TTM mid-layer - the stuff I do inside is vendor specific and also I don't think TTM is explicitly aware of IOMMU ? Do you mean you prefer the IOMMU notifier to be registered from within TTM and then use a hook to call into vendor specific handler ? Andrey > >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c >> index 1ccf1ef..29248a5 100644 >> --- a/drivers/gpu/drm/ttm/ttm_tt.c >> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm) >> else >> ttm_pool_unpopulate(ttm); >> } >> +EXPORT_SYMBOL(ttm_tt_unpopulate); >
Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: > > On 11/25/20 5:42 AM, Christian König wrote: >> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>> It's needed to drop iommu backed pages on device unplug >>> before device's IOMMU group is released. >> >> It would be cleaner if we could do the whole handling in TTM. I also >> need to double check what you are doing with this function. >> >> Christian. > > > Check patch "drm/amdgpu: Register IOMMU topology notifier per device." > to see > how i use it. I don't see why this should go into TTM mid-layer - the > stuff I do inside > is vendor specific and also I don't think TTM is explicitly aware of > IOMMU ? > Do you mean you prefer the IOMMU notifier to be registered from within > TTM > and then use a hook to call into vendor specific handler ? No, that is really vendor specific. What I meant is to have a function like ttm_resource_manager_evict_all() which you only need to call and all tt objects are unpopulated. Give me a day or two to look into this. Christian. > > Andrey > > >> >>> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c >>> b/drivers/gpu/drm/ttm/ttm_tt.c >>> index 1ccf1ef..29248a5 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_tt.c >>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm) >>> else >>> ttm_pool_unpopulate(ttm); >>> } >>> +EXPORT_SYMBOL(ttm_tt_unpopulate); >> > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
On 11/23/20 3:20 PM, Christian König wrote: > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >> >> On 11/25/20 5:42 AM, Christian König wrote: >>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>> It's needed to drop iommu backed pages on device unplug >>>> before device's IOMMU group is released. >>> >>> It would be cleaner if we could do the whole handling in TTM. I also need to >>> double check what you are doing with this function. >>> >>> Christian. >> >> >> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to see >> how i use it. I don't see why this should go into TTM mid-layer - the stuff I >> do inside >> is vendor specific and also I don't think TTM is explicitly aware of IOMMU ? >> Do you mean you prefer the IOMMU notifier to be registered from within TTM >> and then use a hook to call into vendor specific handler ? > > No, that is really vendor specific. > > What I meant is to have a function like ttm_resource_manager_evict_all() which > you only need to call and all tt objects are unpopulated. So instead of this BO list i create and later iterate in amdgpu from the IOMMU patch you just want to do it within TTM with a single function ? Makes much more sense. Andrey > > Give me a day or two to look into this. > > Christian. > >> >> Andrey >> >> >>> >>>> >>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>> --- >>>> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c >>>> index 1ccf1ef..29248a5 100644 >>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c >>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm) >>>> else >>>> ttm_pool_unpopulate(ttm); >>>> } >>>> +EXPORT_SYMBOL(ttm_tt_unpopulate); >>> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&reserved=0 >> >
Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: > > On 11/23/20 3:20 PM, Christian König wrote: >> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>> >>> On 11/25/20 5:42 AM, Christian König wrote: >>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>> It's needed to drop iommu backed pages on device unplug >>>>> before device's IOMMU group is released. >>>> >>>> It would be cleaner if we could do the whole handling in TTM. I >>>> also need to double check what you are doing with this function. >>>> >>>> Christian. >>> >>> >>> Check patch "drm/amdgpu: Register IOMMU topology notifier per >>> device." to see >>> how i use it. I don't see why this should go into TTM mid-layer - >>> the stuff I do inside >>> is vendor specific and also I don't think TTM is explicitly aware of >>> IOMMU ? >>> Do you mean you prefer the IOMMU notifier to be registered from >>> within TTM >>> and then use a hook to call into vendor specific handler ? >> >> No, that is really vendor specific. >> >> What I meant is to have a function like >> ttm_resource_manager_evict_all() which you only need to call and all >> tt objects are unpopulated. > > > So instead of this BO list i create and later iterate in amdgpu from > the IOMMU patch you just want to do it within > TTM with a single function ? Makes much more sense. Yes, exactly. The list_empty() checks we have in TTM for the LRU are actually not the best idea, we should now check the pin_count instead. This way we could also have a list of the pinned BOs in TTM. BTW: Have you thought about what happens when we unpopulate a BO while we still try to use a kernel mapping for it? That could have unforeseen consequences. Christian. > > Andrey > > >> >> Give me a day or two to look into this. >> >> Christian. >> >>> >>> Andrey >>> >>> >>>> >>>>> >>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>> --- >>>>> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c >>>>> b/drivers/gpu/drm/ttm/ttm_tt.c >>>>> index 1ccf1ef..29248a5 100644 >>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c >>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm) >>>>> else >>>>> ttm_pool_unpopulate(ttm); >>>>> } >>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate); >>>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&reserved=0 >>> >>
On 11/23/20 3:41 PM, Christian König wrote: > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: >> >> On 11/23/20 3:20 PM, Christian König wrote: >>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>>> >>>> On 11/25/20 5:42 AM, Christian König wrote: >>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>> It's needed to drop iommu backed pages on device unplug >>>>>> before device's IOMMU group is released. >>>>> >>>>> It would be cleaner if we could do the whole handling in TTM. I also need >>>>> to double check what you are doing with this function. >>>>> >>>>> Christian. >>>> >>>> >>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to see >>>> how i use it. I don't see why this should go into TTM mid-layer - the stuff >>>> I do inside >>>> is vendor specific and also I don't think TTM is explicitly aware of IOMMU ? >>>> Do you mean you prefer the IOMMU notifier to be registered from within TTM >>>> and then use a hook to call into vendor specific handler ? >>> >>> No, that is really vendor specific. >>> >>> What I meant is to have a function like ttm_resource_manager_evict_all() >>> which you only need to call and all tt objects are unpopulated. >> >> >> So instead of this BO list i create and later iterate in amdgpu from the >> IOMMU patch you just want to do it within >> TTM with a single function ? Makes much more sense. > > Yes, exactly. > > The list_empty() checks we have in TTM for the LRU are actually not the best > idea, we should now check the pin_count instead. This way we could also have a > list of the pinned BOs in TTM. So from my IOMMU topology handler I will iterate the TTM LRU for the unpinned BOs and this new function for the pinned ones ? It's probably a good idea to combine both iterations into this new function to cover all the BOs allocated on the device. > > BTW: Have you thought about what happens when we unpopulate a BO while we > still try to use a kernel mapping for it? That could have unforeseen > consequences. Are you asking what happens to kmap or vmap style mapped CPU accesses once we drop all the DMA backing pages for a particular BO ? Because for user mappings (mmap) we took care of this with dummy page reroute but indeed nothing was done for in kernel CPU mappings. Andrey > > Christian. > >> >> Andrey >> >> >>> >>> Give me a day or two to look into this. >>> >>> Christian. >>> >>>> >>>> Andrey >>>> >>>> >>>>> >>>>>> >>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>>> --- >>>>>> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c >>>>>> index 1ccf1ef..29248a5 100644 >>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c >>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm) >>>>>> else >>>>>> ttm_pool_unpopulate(ttm); >>>>>> } >>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate); >>>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@lists.freedesktop.org >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&reserved=0 >>>> >>> >
Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: > > On 11/23/20 3:41 PM, Christian König wrote: >> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: >>> >>> On 11/23/20 3:20 PM, Christian König wrote: >>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>>>> >>>>> On 11/25/20 5:42 AM, Christian König wrote: >>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>>> It's needed to drop iommu backed pages on device unplug >>>>>>> before device's IOMMU group is released. >>>>>> >>>>>> It would be cleaner if we could do the whole handling in TTM. I >>>>>> also need to double check what you are doing with this function. >>>>>> >>>>>> Christian. >>>>> >>>>> >>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per >>>>> device." to see >>>>> how i use it. I don't see why this should go into TTM mid-layer - >>>>> the stuff I do inside >>>>> is vendor specific and also I don't think TTM is explicitly aware >>>>> of IOMMU ? >>>>> Do you mean you prefer the IOMMU notifier to be registered from >>>>> within TTM >>>>> and then use a hook to call into vendor specific handler ? >>>> >>>> No, that is really vendor specific. >>>> >>>> What I meant is to have a function like >>>> ttm_resource_manager_evict_all() which you only need to call and >>>> all tt objects are unpopulated. >>> >>> >>> So instead of this BO list i create and later iterate in amdgpu from >>> the IOMMU patch you just want to do it within >>> TTM with a single function ? Makes much more sense. >> >> Yes, exactly. >> >> The list_empty() checks we have in TTM for the LRU are actually not >> the best idea, we should now check the pin_count instead. This way we >> could also have a list of the pinned BOs in TTM. > > > So from my IOMMU topology handler I will iterate the TTM LRU for the > unpinned BOs and this new function for the pinned ones ? > It's probably a good idea to combine both iterations into this new > function to cover all the BOs allocated on the device. Yes, that's what I had in my mind as well. > > >> >> BTW: Have you thought about what happens when we unpopulate a BO >> while we still try to use a kernel mapping for it? That could have >> unforeseen consequences. > > > Are you asking what happens to kmap or vmap style mapped CPU accesses > once we drop all the DMA backing pages for a particular BO ? Because > for user mappings > (mmap) we took care of this with dummy page reroute but indeed nothing > was done for in kernel CPU mappings. Yes exactly that. In other words what happens if we free the ring buffer while the kernel still writes to it? Christian. > > Andrey > > >> >> Christian. >> >>> >>> Andrey >>> >>> >>>> >>>> Give me a day or two to look into this. >>>> >>>> Christian. >>>> >>>>> >>>>> Andrey >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>>>> --- >>>>>>> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >>>>>>> 1 file changed, 1 insertion(+) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c >>>>>>> b/drivers/gpu/drm/ttm/ttm_tt.c >>>>>>> index 1ccf1ef..29248a5 100644 >>>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c >>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >>>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm) >>>>>>> else >>>>>>> ttm_pool_unpopulate(ttm); >>>>>>> } >>>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate); >>>>>> >>>>> _______________________________________________ >>>>> amd-gfx mailing list >>>>> amd-gfx@lists.freedesktop.org >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&reserved=0 >>>>> >>>> >>
On 11/24/20 2:41 AM, Christian König wrote: > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: >> >> On 11/23/20 3:41 PM, Christian König wrote: >>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: >>>> >>>> On 11/23/20 3:20 PM, Christian König wrote: >>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>>>>> >>>>>> On 11/25/20 5:42 AM, Christian König wrote: >>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>>>> It's needed to drop iommu backed pages on device unplug >>>>>>>> before device's IOMMU group is released. >>>>>>> >>>>>>> It would be cleaner if we could do the whole handling in TTM. I also >>>>>>> need to double check what you are doing with this function. >>>>>>> >>>>>>> Christian. >>>>>> >>>>>> >>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to >>>>>> see >>>>>> how i use it. I don't see why this should go into TTM mid-layer - the >>>>>> stuff I do inside >>>>>> is vendor specific and also I don't think TTM is explicitly aware of IOMMU ? >>>>>> Do you mean you prefer the IOMMU notifier to be registered from within TTM >>>>>> and then use a hook to call into vendor specific handler ? >>>>> >>>>> No, that is really vendor specific. >>>>> >>>>> What I meant is to have a function like ttm_resource_manager_evict_all() >>>>> which you only need to call and all tt objects are unpopulated. >>>> >>>> >>>> So instead of this BO list i create and later iterate in amdgpu from the >>>> IOMMU patch you just want to do it within >>>> TTM with a single function ? Makes much more sense. >>> >>> Yes, exactly. >>> >>> The list_empty() checks we have in TTM for the LRU are actually not the best >>> idea, we should now check the pin_count instead. This way we could also have >>> a list of the pinned BOs in TTM. >> >> >> So from my IOMMU topology handler I will iterate the TTM LRU for the unpinned >> BOs and this new function for the pinned ones ? >> It's probably a good idea to combine both iterations into this new function >> to cover all the BOs allocated on the device. > > Yes, that's what I had in my mind as well. > >> >> >>> >>> BTW: Have you thought about what happens when we unpopulate a BO while we >>> still try to use a kernel mapping for it? That could have unforeseen >>> consequences. >> >> >> Are you asking what happens to kmap or vmap style mapped CPU accesses once we >> drop all the DMA backing pages for a particular BO ? Because for user mappings >> (mmap) we took care of this with dummy page reroute but indeed nothing was >> done for in kernel CPU mappings. > > Yes exactly that. > > In other words what happens if we free the ring buffer while the kernel still > writes to it? > > Christian. While we can't control user application accesses to the mapped buffers explicitly and hence we use page fault rerouting I am thinking that in this case we may be able to sprinkle drm_dev_enter/exit in any such sensitive place were we might CPU access a DMA buffer from the kernel ? Things like CPU page table updates, ring buffer accesses and FW memcpy ? Is there other places ? Another point is that at this point the driver shouldn't access any such buffers as we are at the process finishing the device. AFAIK there is no page fault mechanism for kernel mappings so I don't think there is anything else to do ? Andrey > >> >> Andrey >> >> >>> >>> Christian. >>> >>>> >>>> Andrey >>>> >>>> >>>>> >>>>> Give me a day or two to look into this. >>>>> >>>>> Christian. >>>>> >>>>>> >>>>>> Andrey >>>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >>>>>>>> 1 file changed, 1 insertion(+) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c >>>>>>>> index 1ccf1ef..29248a5 100644 >>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c >>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >>>>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm) >>>>>>>> else >>>>>>>> ttm_pool_unpopulate(ttm); >>>>>>>> } >>>>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate); >>>>>>> >>>>>> _______________________________________________ >>>>>> amd-gfx mailing list >>>>>> amd-gfx@lists.freedesktop.org >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&reserved=0 >>>>>> >>>>> >>> >
Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: > > On 11/24/20 2:41 AM, Christian König wrote: >> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: >>> >>> On 11/23/20 3:41 PM, Christian König wrote: >>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: >>>>> >>>>> On 11/23/20 3:20 PM, Christian König wrote: >>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>>>>>> >>>>>>> On 11/25/20 5:42 AM, Christian König wrote: >>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>>>>> It's needed to drop iommu backed pages on device unplug >>>>>>>>> before device's IOMMU group is released. >>>>>>>> >>>>>>>> It would be cleaner if we could do the whole handling in TTM. I >>>>>>>> also need to double check what you are doing with this function. >>>>>>>> >>>>>>>> Christian. >>>>>>> >>>>>>> >>>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per >>>>>>> device." to see >>>>>>> how i use it. I don't see why this should go into TTM mid-layer >>>>>>> - the stuff I do inside >>>>>>> is vendor specific and also I don't think TTM is explicitly >>>>>>> aware of IOMMU ? >>>>>>> Do you mean you prefer the IOMMU notifier to be registered from >>>>>>> within TTM >>>>>>> and then use a hook to call into vendor specific handler ? >>>>>> >>>>>> No, that is really vendor specific. >>>>>> >>>>>> What I meant is to have a function like >>>>>> ttm_resource_manager_evict_all() which you only need to call and >>>>>> all tt objects are unpopulated. >>>>> >>>>> >>>>> So instead of this BO list i create and later iterate in amdgpu >>>>> from the IOMMU patch you just want to do it within >>>>> TTM with a single function ? Makes much more sense. >>>> >>>> Yes, exactly. >>>> >>>> The list_empty() checks we have in TTM for the LRU are actually not >>>> the best idea, we should now check the pin_count instead. This way >>>> we could also have a list of the pinned BOs in TTM. >>> >>> >>> So from my IOMMU topology handler I will iterate the TTM LRU for the >>> unpinned BOs and this new function for the pinned ones ? >>> It's probably a good idea to combine both iterations into this new >>> function to cover all the BOs allocated on the device. >> >> Yes, that's what I had in my mind as well. >> >>> >>> >>>> >>>> BTW: Have you thought about what happens when we unpopulate a BO >>>> while we still try to use a kernel mapping for it? That could have >>>> unforeseen consequences. >>> >>> >>> Are you asking what happens to kmap or vmap style mapped CPU >>> accesses once we drop all the DMA backing pages for a particular BO >>> ? Because for user mappings >>> (mmap) we took care of this with dummy page reroute but indeed >>> nothing was done for in kernel CPU mappings. >> >> Yes exactly that. >> >> In other words what happens if we free the ring buffer while the >> kernel still writes to it? >> >> Christian. > > > While we can't control user application accesses to the mapped buffers > explicitly and hence we use page fault rerouting > I am thinking that in this case we may be able to sprinkle > drm_dev_enter/exit in any such sensitive place were we might > CPU access a DMA buffer from the kernel ? Yes, I fear we are going to need that. > Things like CPU page table updates, ring buffer accesses and FW memcpy > ? Is there other places ? Puh, good question. I have no idea. > Another point is that at this point the driver shouldn't access any > such buffers as we are at the process finishing the device. > AFAIK there is no page fault mechanism for kernel mappings so I don't > think there is anything else to do ? Well there is a page fault handler for kernel mappings, but that one just prints the stack trace into the system log and calls BUG(); :) Long story short we need to avoid any access to released pages after unplug. No matter if it's from the kernel or userspace. Regards, Christian. > > Andrey
On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote: > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: > > > > On 11/24/20 2:41 AM, Christian König wrote: > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: > > > > > > > > On 11/23/20 3:41 PM, Christian König wrote: > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: > > > > > > > > > > > > On 11/23/20 3:20 PM, Christian König wrote: > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: > > > > > > > > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote: > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: > > > > > > > > > > It's needed to drop iommu backed pages on device unplug > > > > > > > > > > before device's IOMMU group is released. > > > > > > > > > > > > > > > > > > It would be cleaner if we could do the whole > > > > > > > > > handling in TTM. I also need to double check > > > > > > > > > what you are doing with this function. > > > > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > > > > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology > > > > > > > > notifier per device." to see > > > > > > > > how i use it. I don't see why this should go > > > > > > > > into TTM mid-layer - the stuff I do inside > > > > > > > > is vendor specific and also I don't think TTM is > > > > > > > > explicitly aware of IOMMU ? > > > > > > > > Do you mean you prefer the IOMMU notifier to be > > > > > > > > registered from within TTM > > > > > > > > and then use a hook to call into vendor specific handler ? > > > > > > > > > > > > > > No, that is really vendor specific. > > > > > > > > > > > > > > What I meant is to have a function like > > > > > > > ttm_resource_manager_evict_all() which you only need > > > > > > > to call and all tt objects are unpopulated. > > > > > > > > > > > > > > > > > > So instead of this BO list i create and later iterate in > > > > > > amdgpu from the IOMMU patch you just want to do it > > > > > > within > > > > > > TTM with a single function ? Makes much more sense. > > > > > > > > > > Yes, exactly. > > > > > > > > > > The list_empty() checks we have in TTM for the LRU are > > > > > actually not the best idea, we should now check the > > > > > pin_count instead. This way we could also have a list of the > > > > > pinned BOs in TTM. > > > > > > > > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for > > > > the unpinned BOs and this new function for the pinned ones ? > > > > It's probably a good idea to combine both iterations into this > > > > new function to cover all the BOs allocated on the device. > > > > > > Yes, that's what I had in my mind as well. > > > > > > > > > > > > > > > > > > > > > BTW: Have you thought about what happens when we unpopulate > > > > > a BO while we still try to use a kernel mapping for it? That > > > > > could have unforeseen consequences. > > > > > > > > > > > > Are you asking what happens to kmap or vmap style mapped CPU > > > > accesses once we drop all the DMA backing pages for a particular > > > > BO ? Because for user mappings > > > > (mmap) we took care of this with dummy page reroute but indeed > > > > nothing was done for in kernel CPU mappings. > > > > > > Yes exactly that. > > > > > > In other words what happens if we free the ring buffer while the > > > kernel still writes to it? > > > > > > Christian. > > > > > > While we can't control user application accesses to the mapped buffers > > explicitly and hence we use page fault rerouting > > I am thinking that in this case we may be able to sprinkle > > drm_dev_enter/exit in any such sensitive place were we might > > CPU access a DMA buffer from the kernel ? > > Yes, I fear we are going to need that. Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we could stuff this into begin/end_cpu_access (but only for the kernel, so a bit tricky)? btw the other issue with dma-buf (and even worse with dma_fence) is refcounting of the underlying drm_device. I'd expect that all your callbacks go boom if the dma_buf outlives your drm_device. That part isn't yet solved in your series here. -Daniel > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ? > > Is there other places ? > > Puh, good question. I have no idea. > > > Another point is that at this point the driver shouldn't access any such > > buffers as we are at the process finishing the device. > > AFAIK there is no page fault mechanism for kernel mappings so I don't > > think there is anything else to do ? > > Well there is a page fault handler for kernel mappings, but that one just > prints the stack trace into the system log and calls BUG(); :) > > Long story short we need to avoid any access to released pages after unplug. > No matter if it's from the kernel or userspace. > > Regards, > Christian. > > > > > Andrey >
Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: > It's needed to drop iommu backed pages on device unplug > before device's IOMMU group is released. It would be cleaner if we could do the whole handling in TTM. I also need to double check what you are doing with this function. Christian. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > --- > drivers/gpu/drm/ttm/ttm_tt.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c > index 1ccf1ef..29248a5 100644 > --- a/drivers/gpu/drm/ttm/ttm_tt.c > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm) > else > ttm_pool_unpopulate(ttm); > } > +EXPORT_SYMBOL(ttm_tt_unpopulate);
Am 25.11.20 um 11:40 schrieb Daniel Vetter: > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote: >> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: >>> On 11/24/20 2:41 AM, Christian König wrote: >>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: >>>>> On 11/23/20 3:41 PM, Christian König wrote: >>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: >>>>>>> On 11/23/20 3:20 PM, Christian König wrote: >>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote: >>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>>>>>>> It's needed to drop iommu backed pages on device unplug >>>>>>>>>>> before device's IOMMU group is released. >>>>>>>>>> It would be cleaner if we could do the whole >>>>>>>>>> handling in TTM. I also need to double check >>>>>>>>>> what you are doing with this function. >>>>>>>>>> >>>>>>>>>> Christian. >>>>>>>>> >>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology >>>>>>>>> notifier per device." to see >>>>>>>>> how i use it. I don't see why this should go >>>>>>>>> into TTM mid-layer - the stuff I do inside >>>>>>>>> is vendor specific and also I don't think TTM is >>>>>>>>> explicitly aware of IOMMU ? >>>>>>>>> Do you mean you prefer the IOMMU notifier to be >>>>>>>>> registered from within TTM >>>>>>>>> and then use a hook to call into vendor specific handler ? >>>>>>>> No, that is really vendor specific. >>>>>>>> >>>>>>>> What I meant is to have a function like >>>>>>>> ttm_resource_manager_evict_all() which you only need >>>>>>>> to call and all tt objects are unpopulated. >>>>>>> >>>>>>> So instead of this BO list i create and later iterate in >>>>>>> amdgpu from the IOMMU patch you just want to do it >>>>>>> within >>>>>>> TTM with a single function ? Makes much more sense. >>>>>> Yes, exactly. >>>>>> >>>>>> The list_empty() checks we have in TTM for the LRU are >>>>>> actually not the best idea, we should now check the >>>>>> pin_count instead. This way we could also have a list of the >>>>>> pinned BOs in TTM. >>>>> >>>>> So from my IOMMU topology handler I will iterate the TTM LRU for >>>>> the unpinned BOs and this new function for the pinned ones ? >>>>> It's probably a good idea to combine both iterations into this >>>>> new function to cover all the BOs allocated on the device. >>>> Yes, that's what I had in my mind as well. >>>> >>>>> >>>>>> BTW: Have you thought about what happens when we unpopulate >>>>>> a BO while we still try to use a kernel mapping for it? That >>>>>> could have unforeseen consequences. >>>>> >>>>> Are you asking what happens to kmap or vmap style mapped CPU >>>>> accesses once we drop all the DMA backing pages for a particular >>>>> BO ? Because for user mappings >>>>> (mmap) we took care of this with dummy page reroute but indeed >>>>> nothing was done for in kernel CPU mappings. >>>> Yes exactly that. >>>> >>>> In other words what happens if we free the ring buffer while the >>>> kernel still writes to it? >>>> >>>> Christian. >>> >>> While we can't control user application accesses to the mapped buffers >>> explicitly and hence we use page fault rerouting >>> I am thinking that in this case we may be able to sprinkle >>> drm_dev_enter/exit in any such sensitive place were we might >>> CPU access a DMA buffer from the kernel ? >> Yes, I fear we are going to need that. > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we > could stuff this into begin/end_cpu_access (but only for the kernel, so a > bit tricky)? Oh very very good point! I haven't thought about DMA-buf mmaps in this context yet. > btw the other issue with dma-buf (and even worse with dma_fence) is > refcounting of the underlying drm_device. I'd expect that all your > callbacks go boom if the dma_buf outlives your drm_device. That part isn't > yet solved in your series here. Well thinking more about this, it seems to be a another really good argument why mapping pages from DMA-bufs into application address space directly is a very bad idea :) But yes, we essentially can't remove the device as long as there is a DMA-buf with mappings. No idea how to clean that one up. Christian. > -Daniel > >>> Things like CPU page table updates, ring buffer accesses and FW memcpy ? >>> Is there other places ? >> Puh, good question. I have no idea. >> >>> Another point is that at this point the driver shouldn't access any such >>> buffers as we are at the process finishing the device. >>> AFAIK there is no page fault mechanism for kernel mappings so I don't >>> think there is anything else to do ? >> Well there is a page fault handler for kernel mappings, but that one just >> prints the stack trace into the system log and calls BUG(); :) >> >> Long story short we need to avoid any access to released pages after unplug. >> No matter if it's from the kernel or userspace. >> >> Regards, >> Christian. >> >>> Andrey
On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote: > Am 25.11.20 um 11:40 schrieb Daniel Vetter: > > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote: > > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: > > > > On 11/24/20 2:41 AM, Christian König wrote: > > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: > > > > > > On 11/23/20 3:41 PM, Christian König wrote: > > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: > > > > > > > > On 11/23/20 3:20 PM, Christian König wrote: > > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: > > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote: > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: > > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug > > > > > > > > > > > > before device's IOMMU group is released. > > > > > > > > > > > It would be cleaner if we could do the whole > > > > > > > > > > > handling in TTM. I also need to double check > > > > > > > > > > > what you are doing with this function. > > > > > > > > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology > > > > > > > > > > notifier per device." to see > > > > > > > > > > how i use it. I don't see why this should go > > > > > > > > > > into TTM mid-layer - the stuff I do inside > > > > > > > > > > is vendor specific and also I don't think TTM is > > > > > > > > > > explicitly aware of IOMMU ? > > > > > > > > > > Do you mean you prefer the IOMMU notifier to be > > > > > > > > > > registered from within TTM > > > > > > > > > > and then use a hook to call into vendor specific handler ? > > > > > > > > > No, that is really vendor specific. > > > > > > > > > > > > > > > > > > What I meant is to have a function like > > > > > > > > > ttm_resource_manager_evict_all() which you only need > > > > > > > > > to call and all tt objects are unpopulated. > > > > > > > > > > > > > > > > So instead of this BO list i create and later iterate in > > > > > > > > amdgpu from the IOMMU patch you just want to do it > > > > > > > > within > > > > > > > > TTM with a single function ? Makes much more sense. > > > > > > > Yes, exactly. > > > > > > > > > > > > > > The list_empty() checks we have in TTM for the LRU are > > > > > > > actually not the best idea, we should now check the > > > > > > > pin_count instead. This way we could also have a list of the > > > > > > > pinned BOs in TTM. > > > > > > > > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for > > > > > > the unpinned BOs and this new function for the pinned ones ? > > > > > > It's probably a good idea to combine both iterations into this > > > > > > new function to cover all the BOs allocated on the device. > > > > > Yes, that's what I had in my mind as well. > > > > > > > > > > > > > > > > > > BTW: Have you thought about what happens when we unpopulate > > > > > > > a BO while we still try to use a kernel mapping for it? That > > > > > > > could have unforeseen consequences. > > > > > > > > > > > > Are you asking what happens to kmap or vmap style mapped CPU > > > > > > accesses once we drop all the DMA backing pages for a particular > > > > > > BO ? Because for user mappings > > > > > > (mmap) we took care of this with dummy page reroute but indeed > > > > > > nothing was done for in kernel CPU mappings. > > > > > Yes exactly that. > > > > > > > > > > In other words what happens if we free the ring buffer while the > > > > > kernel still writes to it? > > > > > > > > > > Christian. > > > > > > > > While we can't control user application accesses to the mapped buffers > > > > explicitly and hence we use page fault rerouting > > > > I am thinking that in this case we may be able to sprinkle > > > > drm_dev_enter/exit in any such sensitive place were we might > > > > CPU access a DMA buffer from the kernel ? > > > Yes, I fear we are going to need that. > > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we > > could stuff this into begin/end_cpu_access (but only for the kernel, so a > > bit tricky)? > > Oh very very good point! I haven't thought about DMA-buf mmaps in this > context yet. > > > > btw the other issue with dma-buf (and even worse with dma_fence) is > > refcounting of the underlying drm_device. I'd expect that all your > > callbacks go boom if the dma_buf outlives your drm_device. That part isn't > > yet solved in your series here. > > Well thinking more about this, it seems to be a another really good argument > why mapping pages from DMA-bufs into application address space directly is a > very bad idea :) > > But yes, we essentially can't remove the device as long as there is a > DMA-buf with mappings. No idea how to clean that one up. drm_dev_get/put in drm_prime helpers should get us like 90% there I think. The even more worrying thing is random dma_fence attached to the dma_resv object. We could try to clean all of ours up, but they could have escaped already into some other driver. And since we're talking about egpu hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case. I have no how to fix that one :-/ -Daniel > > Christian. > > > -Daniel > > > > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ? > > > > Is there other places ? > > > Puh, good question. I have no idea. > > > > > > > Another point is that at this point the driver shouldn't access any such > > > > buffers as we are at the process finishing the device. > > > > AFAIK there is no page fault mechanism for kernel mappings so I don't > > > > think there is anything else to do ? > > > Well there is a page fault handler for kernel mappings, but that one just > > > prints the stack trace into the system log and calls BUG(); :) > > > > > > Long story short we need to avoid any access to released pages after unplug. > > > No matter if it's from the kernel or userspace. > > > > > > Regards, > > > Christian. > > > > > > > Andrey >
On 2020-11-25 1:57 p.m., Christian König wrote: > > Well thinking more about this, it seems to be a another really good > argument why mapping pages from DMA-bufs into application address space > directly is a very bad idea :) Apologies for going off on a tangent here... Since allowing userspace mmap with dma-buf fds seems to be a trap in general[0], I wonder if there's any way we could stop supporting that? [0] E.g. mutter had to disable handing out dma-bufs for screen capture by default with non-i915 for now, because in particular with discrete GPUs, direct CPU reads can be unusably slow (think single-digit frames per second), and of course there's other userspace which goes "ooh, dma-buf, let's map and read!".
On Wed, Nov 25, 2020 at 5:56 PM Michel Dänzer <michel@daenzer.net> wrote: > > On 2020-11-25 1:57 p.m., Christian König wrote: > > > > Well thinking more about this, it seems to be a another really good > > argument why mapping pages from DMA-bufs into application address space > > directly is a very bad idea :) > > Apologies for going off on a tangent here... > > Since allowing userspace mmap with dma-buf fds seems to be a trap in > general[0], I wonder if there's any way we could stop supporting that? > > > [0] E.g. mutter had to disable handing out dma-bufs for screen capture > by default with non-i915 for now, because in particular with discrete > GPUs, direct CPU reads can be unusably slow (think single-digit frames > per second), and of course there's other userspace which goes "ooh, > dma-buf, let's map and read!". I think a pile of applications (cros included) use it to do uploads across process boundaries. Think locked down jpeg decoder and stuff like that. For that use-case it seems to work ok. But yeah don't read from dma-buf. I'm pretty sure it's dead slow on almost everything, except integrated gpu which have A) a coherent fabric with the gpu and B) that fabric is actually faster for rendering in general, not just for dedicated buffers allocated for down/upload. -Daniel
On 11/25/20 11:36 AM, Daniel Vetter wrote: > On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote: >> Am 25.11.20 um 11:40 schrieb Daniel Vetter: >>> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote: >>>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: >>>>> On 11/24/20 2:41 AM, Christian König wrote: >>>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: >>>>>>> On 11/23/20 3:41 PM, Christian König wrote: >>>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: >>>>>>>>> On 11/23/20 3:20 PM, Christian König wrote: >>>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote: >>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug >>>>>>>>>>>>> before device's IOMMU group is released. >>>>>>>>>>>> It would be cleaner if we could do the whole >>>>>>>>>>>> handling in TTM. I also need to double check >>>>>>>>>>>> what you are doing with this function. >>>>>>>>>>>> >>>>>>>>>>>> Christian. >>>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology >>>>>>>>>>> notifier per device." to see >>>>>>>>>>> how i use it. I don't see why this should go >>>>>>>>>>> into TTM mid-layer - the stuff I do inside >>>>>>>>>>> is vendor specific and also I don't think TTM is >>>>>>>>>>> explicitly aware of IOMMU ? >>>>>>>>>>> Do you mean you prefer the IOMMU notifier to be >>>>>>>>>>> registered from within TTM >>>>>>>>>>> and then use a hook to call into vendor specific handler ? >>>>>>>>>> No, that is really vendor specific. >>>>>>>>>> >>>>>>>>>> What I meant is to have a function like >>>>>>>>>> ttm_resource_manager_evict_all() which you only need >>>>>>>>>> to call and all tt objects are unpopulated. >>>>>>>>> So instead of this BO list i create and later iterate in >>>>>>>>> amdgpu from the IOMMU patch you just want to do it >>>>>>>>> within >>>>>>>>> TTM with a single function ? Makes much more sense. >>>>>>>> Yes, exactly. >>>>>>>> >>>>>>>> The list_empty() checks we have in TTM for the LRU are >>>>>>>> actually not the best idea, we should now check the >>>>>>>> pin_count instead. This way we could also have a list of the >>>>>>>> pinned BOs in TTM. >>>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for >>>>>>> the unpinned BOs and this new function for the pinned ones ? >>>>>>> It's probably a good idea to combine both iterations into this >>>>>>> new function to cover all the BOs allocated on the device. >>>>>> Yes, that's what I had in my mind as well. >>>>>> >>>>>>>> BTW: Have you thought about what happens when we unpopulate >>>>>>>> a BO while we still try to use a kernel mapping for it? That >>>>>>>> could have unforeseen consequences. >>>>>>> Are you asking what happens to kmap or vmap style mapped CPU >>>>>>> accesses once we drop all the DMA backing pages for a particular >>>>>>> BO ? Because for user mappings >>>>>>> (mmap) we took care of this with dummy page reroute but indeed >>>>>>> nothing was done for in kernel CPU mappings. >>>>>> Yes exactly that. >>>>>> >>>>>> In other words what happens if we free the ring buffer while the >>>>>> kernel still writes to it? >>>>>> >>>>>> Christian. >>>>> While we can't control user application accesses to the mapped buffers >>>>> explicitly and hence we use page fault rerouting >>>>> I am thinking that in this case we may be able to sprinkle >>>>> drm_dev_enter/exit in any such sensitive place were we might >>>>> CPU access a DMA buffer from the kernel ? >>>> Yes, I fear we are going to need that. >>> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we >>> could stuff this into begin/end_cpu_access Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access driver specific hook ? >>> (but only for the kernel, so a >>> bit tricky)? Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by some user process ? And if we do need this distinction I think we should be able to differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL for kernel thread. >> Oh very very good point! I haven't thought about DMA-buf mmaps in this >> context yet. >> >> >>> btw the other issue with dma-buf (and even worse with dma_fence) is >>> refcounting of the underlying drm_device. I'd expect that all your >>> callbacks go boom if the dma_buf outlives your drm_device. That part isn't >>> yet solved in your series here. >> Well thinking more about this, it seems to be a another really good argument >> why mapping pages from DMA-bufs into application address space directly is a >> very bad idea :) >> >> But yes, we essentially can't remove the device as long as there is a >> DMA-buf with mappings. No idea how to clean that one up. > drm_dev_get/put in drm_prime helpers should get us like 90% there I think. What are the other 10% ? > > The even more worrying thing is random dma_fence attached to the dma_resv > object. We could try to clean all of ours up, but they could have escaped > already into some other driver. And since we're talking about egpu > hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case. > > I have no how to fix that one :-/ > -Daniel I assume you are referring to sync_file_create/sync_file_get_fence API for dma_fence export/import ? So with DMA bufs we have the drm_gem_object as exporter specific private data and so we can do drm_dev_get and put at the drm_gem_object layer to bind device life cycle to that of each GEM object but, we don't have such mid-layer for dma_fence which could allow us to increment device reference for each fence out there related to that device - is my understanding correct ? Andrey Andrey >> Christian. >> >>> -Daniel >>> >>>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ? >>>>> Is there other places ? >>>> Puh, good question. I have no idea. >>>> >>>>> Another point is that at this point the driver shouldn't access any such >>>>> buffers as we are at the process finishing the device. >>>>> AFAIK there is no page fault mechanism for kernel mappings so I don't >>>>> think there is anything else to do ? >>>> Well there is a page fault handler for kernel mappings, but that one just >>>> prints the stack trace into the system log and calls BUG(); :) >>>> >>>> Long story short we need to avoid any access to released pages after unplug. >>>> No matter if it's from the kernel or userspace. >>>> >>>> Regards, >>>> Christian. >>>> >>>>> Andrey
Hey Daniel, just a ping on a bunch of questions i posted bellow. Andtey
On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote: > > On 11/25/20 11:36 AM, Daniel Vetter wrote: > > On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote: > > > Am 25.11.20 um 11:40 schrieb Daniel Vetter: > > > > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote: > > > > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: > > > > > > On 11/24/20 2:41 AM, Christian König wrote: > > > > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: > > > > > > > > On 11/23/20 3:41 PM, Christian König wrote: > > > > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: > > > > > > > > > > On 11/23/20 3:20 PM, Christian König wrote: > > > > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: > > > > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote: > > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: > > > > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug > > > > > > > > > > > > > > before device's IOMMU group is released. > > > > > > > > > > > > > It would be cleaner if we could do the whole > > > > > > > > > > > > > handling in TTM. I also need to double check > > > > > > > > > > > > > what you are doing with this function. > > > > > > > > > > > > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology > > > > > > > > > > > > notifier per device." to see > > > > > > > > > > > > how i use it. I don't see why this should go > > > > > > > > > > > > into TTM mid-layer - the stuff I do inside > > > > > > > > > > > > is vendor specific and also I don't think TTM is > > > > > > > > > > > > explicitly aware of IOMMU ? > > > > > > > > > > > > Do you mean you prefer the IOMMU notifier to be > > > > > > > > > > > > registered from within TTM > > > > > > > > > > > > and then use a hook to call into vendor specific handler ? > > > > > > > > > > > No, that is really vendor specific. > > > > > > > > > > > > > > > > > > > > > > What I meant is to have a function like > > > > > > > > > > > ttm_resource_manager_evict_all() which you only need > > > > > > > > > > > to call and all tt objects are unpopulated. > > > > > > > > > > So instead of this BO list i create and later iterate in > > > > > > > > > > amdgpu from the IOMMU patch you just want to do it > > > > > > > > > > within > > > > > > > > > > TTM with a single function ? Makes much more sense. > > > > > > > > > Yes, exactly. > > > > > > > > > > > > > > > > > > The list_empty() checks we have in TTM for the LRU are > > > > > > > > > actually not the best idea, we should now check the > > > > > > > > > pin_count instead. This way we could also have a list of the > > > > > > > > > pinned BOs in TTM. > > > > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for > > > > > > > > the unpinned BOs and this new function for the pinned ones ? > > > > > > > > It's probably a good idea to combine both iterations into this > > > > > > > > new function to cover all the BOs allocated on the device. > > > > > > > Yes, that's what I had in my mind as well. > > > > > > > > > > > > > > > > BTW: Have you thought about what happens when we unpopulate > > > > > > > > > a BO while we still try to use a kernel mapping for it? That > > > > > > > > > could have unforeseen consequences. > > > > > > > > Are you asking what happens to kmap or vmap style mapped CPU > > > > > > > > accesses once we drop all the DMA backing pages for a particular > > > > > > > > BO ? Because for user mappings > > > > > > > > (mmap) we took care of this with dummy page reroute but indeed > > > > > > > > nothing was done for in kernel CPU mappings. > > > > > > > Yes exactly that. > > > > > > > > > > > > > > In other words what happens if we free the ring buffer while the > > > > > > > kernel still writes to it? > > > > > > > > > > > > > > Christian. > > > > > > While we can't control user application accesses to the mapped buffers > > > > > > explicitly and hence we use page fault rerouting > > > > > > I am thinking that in this case we may be able to sprinkle > > > > > > drm_dev_enter/exit in any such sensitive place were we might > > > > > > CPU access a DMA buffer from the kernel ? > > > > > Yes, I fear we are going to need that. > > > > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we > > > > could stuff this into begin/end_cpu_access > > > Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access > driver specific hook ? > > > > > > (but only for the kernel, so a > > > > bit tricky)? > > > Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by > some user process ? And if we do need this distinction I think we should be able to > differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL > for kernel thread. Userspace mmap is handled by punching out the pte. So we don't need to do anything special there. For kernel mmap the begin/end should be all in the same context (so we could use the srcu lock that works underneath drm_dev_enter/exit), since at least right now kernel vmaps of dma-buf are very long-lived. But the good news is that Thomas Zimmerman is working on this problem already for different reasons, so it might be that we won't have any long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in there. > > > Oh very very good point! I haven't thought about DMA-buf mmaps in this > > > context yet. > > > > > > > > > > btw the other issue with dma-buf (and even worse with dma_fence) is > > > > refcounting of the underlying drm_device. I'd expect that all your > > > > callbacks go boom if the dma_buf outlives your drm_device. That part isn't > > > > yet solved in your series here. > > > Well thinking more about this, it seems to be a another really good argument > > > why mapping pages from DMA-bufs into application address space directly is a > > > very bad idea :) > > > > > > But yes, we essentially can't remove the device as long as there is a > > > DMA-buf with mappings. No idea how to clean that one up. > > drm_dev_get/put in drm_prime helpers should get us like 90% there I think. > > > What are the other 10% ? dma_fence, which is also about 90% of the work probably. But I'm guesstimating only 10% of the oopses you can hit. Since generally the dma_fence for a buffer don't outlive the underlying buffer. So usually no problems happen when we've solved the dma-buf sharing, but the dma_fence can outlive the dma-buf, so there's still possibilities of crashing. > > The even more worrying thing is random dma_fence attached to the dma_resv > > object. We could try to clean all of ours up, but they could have escaped > > already into some other driver. And since we're talking about egpu > > hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case. > > > > I have no how to fix that one :-/ > > -Daniel > > > I assume you are referring to sync_file_create/sync_file_get_fence API for > dma_fence export/import ? So dma_fence is a general issue, there's a pile of interfaces that result in sharing with other drivers: - dma_resv in the dma_buf - sync_file - drm_syncobj (but I think that's not yet cross driver, but probably changes) In each of these cases drivers can pick up the dma_fence and use it internally for all kinds of purposes (could end up in the scheduler or wherever). > So with DMA bufs we have the drm_gem_object as exporter specific private data > and so we can do drm_dev_get and put at the drm_gem_object layer to bind > device life cycle > to that of each GEM object but, we don't have such mid-layer for dma_fence > which could allow > us to increment device reference for each fence out there related to that > device - is my understanding correct ? Yeah that's the annoying part with dma-fence. No existing generic place to put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the other problems first. -Daniel > > Andrey > > > Andrey > > > > > Christian. > > > > > > > -Daniel > > > > > > > > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ? > > > > > > Is there other places ? > > > > > Puh, good question. I have no idea. > > > > > > > > > > > Another point is that at this point the driver shouldn't access any such > > > > > > buffers as we are at the process finishing the device. > > > > > > AFAIK there is no page fault mechanism for kernel mappings so I don't > > > > > > think there is anything else to do ? > > > > > Well there is a page fault handler for kernel mappings, but that one just > > > > > prints the stack trace into the system log and calls BUG(); :) > > > > > > > > > > Long story short we need to avoid any access to released pages after unplug. > > > > > No matter if it's from the kernel or userspace. > > > > > > > > > > Regards, > > > > > Christian. > > > > > > > > > > > Andrey
On 11/27/20 9:59 AM, Daniel Vetter wrote: > On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote: >> On 11/25/20 11:36 AM, Daniel Vetter wrote: >>> On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote: >>>> Am 25.11.20 um 11:40 schrieb Daniel Vetter: >>>>> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote: >>>>>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: >>>>>>> On 11/24/20 2:41 AM, Christian König wrote: >>>>>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: >>>>>>>>> On 11/23/20 3:41 PM, Christian König wrote: >>>>>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: >>>>>>>>>>> On 11/23/20 3:20 PM, Christian König wrote: >>>>>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>>>>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote: >>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug >>>>>>>>>>>>>>> before device's IOMMU group is released. >>>>>>>>>>>>>> It would be cleaner if we could do the whole >>>>>>>>>>>>>> handling in TTM. I also need to double check >>>>>>>>>>>>>> what you are doing with this function. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Christian. >>>>>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology >>>>>>>>>>>>> notifier per device." to see >>>>>>>>>>>>> how i use it. I don't see why this should go >>>>>>>>>>>>> into TTM mid-layer - the stuff I do inside >>>>>>>>>>>>> is vendor specific and also I don't think TTM is >>>>>>>>>>>>> explicitly aware of IOMMU ? >>>>>>>>>>>>> Do you mean you prefer the IOMMU notifier to be >>>>>>>>>>>>> registered from within TTM >>>>>>>>>>>>> and then use a hook to call into vendor specific handler ? >>>>>>>>>>>> No, that is really vendor specific. >>>>>>>>>>>> >>>>>>>>>>>> What I meant is to have a function like >>>>>>>>>>>> ttm_resource_manager_evict_all() which you only need >>>>>>>>>>>> to call and all tt objects are unpopulated. >>>>>>>>>>> So instead of this BO list i create and later iterate in >>>>>>>>>>> amdgpu from the IOMMU patch you just want to do it >>>>>>>>>>> within >>>>>>>>>>> TTM with a single function ? Makes much more sense. >>>>>>>>>> Yes, exactly. >>>>>>>>>> >>>>>>>>>> The list_empty() checks we have in TTM for the LRU are >>>>>>>>>> actually not the best idea, we should now check the >>>>>>>>>> pin_count instead. This way we could also have a list of the >>>>>>>>>> pinned BOs in TTM. >>>>>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for >>>>>>>>> the unpinned BOs and this new function for the pinned ones ? >>>>>>>>> It's probably a good idea to combine both iterations into this >>>>>>>>> new function to cover all the BOs allocated on the device. >>>>>>>> Yes, that's what I had in my mind as well. >>>>>>>> >>>>>>>>>> BTW: Have you thought about what happens when we unpopulate >>>>>>>>>> a BO while we still try to use a kernel mapping for it? That >>>>>>>>>> could have unforeseen consequences. >>>>>>>>> Are you asking what happens to kmap or vmap style mapped CPU >>>>>>>>> accesses once we drop all the DMA backing pages for a particular >>>>>>>>> BO ? Because for user mappings >>>>>>>>> (mmap) we took care of this with dummy page reroute but indeed >>>>>>>>> nothing was done for in kernel CPU mappings. >>>>>>>> Yes exactly that. >>>>>>>> >>>>>>>> In other words what happens if we free the ring buffer while the >>>>>>>> kernel still writes to it? >>>>>>>> >>>>>>>> Christian. >>>>>>> While we can't control user application accesses to the mapped buffers >>>>>>> explicitly and hence we use page fault rerouting >>>>>>> I am thinking that in this case we may be able to sprinkle >>>>>>> drm_dev_enter/exit in any such sensitive place were we might >>>>>>> CPU access a DMA buffer from the kernel ? >>>>>> Yes, I fear we are going to need that. >>>>> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we >>>>> could stuff this into begin/end_cpu_access >> >> Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access >> driver specific hook ? >> >> >>>>> (but only for the kernel, so a >>>>> bit tricky)? >> >> Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by >> some user process ? And if we do need this distinction I think we should be able to >> differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL >> for kernel thread. > Userspace mmap is handled by punching out the pte. So we don't need to do > anything special there. > > For kernel mmap the begin/end should be all in the same context (so we > could use the srcu lock that works underneath drm_dev_enter/exit), since > at least right now kernel vmaps of dma-buf are very long-lived. If by same context you mean the right drm_device (the exporter's one) then this should be ok as I am seeing from amdgpu implementation of the callback - amdgpu_dma_buf_begin_cpu_access. We just need to add handler for .end_cpu_access callback to call drm_dev_exit there. Andrey > > But the good news is that Thomas Zimmerman is working on this problem > already for different reasons, so it might be that we won't have any > long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in > there. > >>>> Oh very very good point! I haven't thought about DMA-buf mmaps in this >>>> context yet. >>>> >>>> >>>>> btw the other issue with dma-buf (and even worse with dma_fence) is >>>>> refcounting of the underlying drm_device. I'd expect that all your >>>>> callbacks go boom if the dma_buf outlives your drm_device. That part isn't >>>>> yet solved in your series here. >>>> Well thinking more about this, it seems to be a another really good argument >>>> why mapping pages from DMA-bufs into application address space directly is a >>>> very bad idea :) >>>> >>>> But yes, we essentially can't remove the device as long as there is a >>>> DMA-buf with mappings. No idea how to clean that one up. >>> drm_dev_get/put in drm_prime helpers should get us like 90% there I think. >> >> What are the other 10% ? > dma_fence, which is also about 90% of the work probably. But I'm > guesstimating only 10% of the oopses you can hit. Since generally the > dma_fence for a buffer don't outlive the underlying buffer. So usually no > problems happen when we've solved the dma-buf sharing, but the dma_fence > can outlive the dma-buf, so there's still possibilities of crashing. > >>> The even more worrying thing is random dma_fence attached to the dma_resv >>> object. We could try to clean all of ours up, but they could have escaped >>> already into some other driver. And since we're talking about egpu >>> hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case. >>> >>> I have no how to fix that one :-/ >>> -Daniel >> >> I assume you are referring to sync_file_create/sync_file_get_fence API for >> dma_fence export/import ? > So dma_fence is a general issue, there's a pile of interfaces that result > in sharing with other drivers: > - dma_resv in the dma_buf > - sync_file > - drm_syncobj (but I think that's not yet cross driver, but probably > changes) > > In each of these cases drivers can pick up the dma_fence and use it > internally for all kinds of purposes (could end up in the scheduler or > wherever). > >> So with DMA bufs we have the drm_gem_object as exporter specific private data >> and so we can do drm_dev_get and put at the drm_gem_object layer to bind >> device life cycle >> to that of each GEM object but, we don't have such mid-layer for dma_fence >> which could allow >> us to increment device reference for each fence out there related to that >> device - is my understanding correct ? > Yeah that's the annoying part with dma-fence. No existing generic place to > put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the > other problems first. > -Daniel > >> Andrey >> >> >> Andrey >> >> >>>> Christian. >>>> >>>>> -Daniel >>>>> >>>>>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ? >>>>>>> Is there other places ? >>>>>> Puh, good question. I have no idea. >>>>>> >>>>>>> Another point is that at this point the driver shouldn't access any such >>>>>>> buffers as we are at the process finishing the device. >>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I don't >>>>>>> think there is anything else to do ? >>>>>> Well there is a page fault handler for kernel mappings, but that one just >>>>>> prints the stack trace into the system log and calls BUG(); :) >>>>>> >>>>>> Long story short we need to avoid any access to released pages after unplug. >>>>>> No matter if it's from the kernel or userspace. >>>>>> >>>>>> Regards, >>>>>> Christian. >>>>>> >>>>>>> Andrey
On Fri, Nov 27, 2020 at 11:04:55AM -0500, Andrey Grodzovsky wrote: > > On 11/27/20 9:59 AM, Daniel Vetter wrote: > > On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote: > > > On 11/25/20 11:36 AM, Daniel Vetter wrote: > > > > On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote: > > > > > Am 25.11.20 um 11:40 schrieb Daniel Vetter: > > > > > > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote: > > > > > > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: > > > > > > > > On 11/24/20 2:41 AM, Christian König wrote: > > > > > > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: > > > > > > > > > > On 11/23/20 3:41 PM, Christian König wrote: > > > > > > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: > > > > > > > > > > > > On 11/23/20 3:20 PM, Christian König wrote: > > > > > > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: > > > > > > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote: > > > > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: > > > > > > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug > > > > > > > > > > > > > > > > before device's IOMMU group is released. > > > > > > > > > > > > > > > It would be cleaner if we could do the whole > > > > > > > > > > > > > > > handling in TTM. I also need to double check > > > > > > > > > > > > > > > what you are doing with this function. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology > > > > > > > > > > > > > > notifier per device." to see > > > > > > > > > > > > > > how i use it. I don't see why this should go > > > > > > > > > > > > > > into TTM mid-layer - the stuff I do inside > > > > > > > > > > > > > > is vendor specific and also I don't think TTM is > > > > > > > > > > > > > > explicitly aware of IOMMU ? > > > > > > > > > > > > > > Do you mean you prefer the IOMMU notifier to be > > > > > > > > > > > > > > registered from within TTM > > > > > > > > > > > > > > and then use a hook to call into vendor specific handler ? > > > > > > > > > > > > > No, that is really vendor specific. > > > > > > > > > > > > > > > > > > > > > > > > > > What I meant is to have a function like > > > > > > > > > > > > > ttm_resource_manager_evict_all() which you only need > > > > > > > > > > > > > to call and all tt objects are unpopulated. > > > > > > > > > > > > So instead of this BO list i create and later iterate in > > > > > > > > > > > > amdgpu from the IOMMU patch you just want to do it > > > > > > > > > > > > within > > > > > > > > > > > > TTM with a single function ? Makes much more sense. > > > > > > > > > > > Yes, exactly. > > > > > > > > > > > > > > > > > > > > > > The list_empty() checks we have in TTM for the LRU are > > > > > > > > > > > actually not the best idea, we should now check the > > > > > > > > > > > pin_count instead. This way we could also have a list of the > > > > > > > > > > > pinned BOs in TTM. > > > > > > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for > > > > > > > > > > the unpinned BOs and this new function for the pinned ones ? > > > > > > > > > > It's probably a good idea to combine both iterations into this > > > > > > > > > > new function to cover all the BOs allocated on the device. > > > > > > > > > Yes, that's what I had in my mind as well. > > > > > > > > > > > > > > > > > > > > BTW: Have you thought about what happens when we unpopulate > > > > > > > > > > > a BO while we still try to use a kernel mapping for it? That > > > > > > > > > > > could have unforeseen consequences. > > > > > > > > > > Are you asking what happens to kmap or vmap style mapped CPU > > > > > > > > > > accesses once we drop all the DMA backing pages for a particular > > > > > > > > > > BO ? Because for user mappings > > > > > > > > > > (mmap) we took care of this with dummy page reroute but indeed > > > > > > > > > > nothing was done for in kernel CPU mappings. > > > > > > > > > Yes exactly that. > > > > > > > > > > > > > > > > > > In other words what happens if we free the ring buffer while the > > > > > > > > > kernel still writes to it? > > > > > > > > > > > > > > > > > > Christian. > > > > > > > > While we can't control user application accesses to the mapped buffers > > > > > > > > explicitly and hence we use page fault rerouting > > > > > > > > I am thinking that in this case we may be able to sprinkle > > > > > > > > drm_dev_enter/exit in any such sensitive place were we might > > > > > > > > CPU access a DMA buffer from the kernel ? > > > > > > > Yes, I fear we are going to need that. > > > > > > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we > > > > > > could stuff this into begin/end_cpu_access > > > > > > Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access > > > driver specific hook ? > > > > > > > > > > > > (but only for the kernel, so a > > > > > > bit tricky)? > > > > > > Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by > > > some user process ? And if we do need this distinction I think we should be able to > > > differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL > > > for kernel thread. > > Userspace mmap is handled by punching out the pte. So we don't need to do > > anything special there. > > > > For kernel mmap the begin/end should be all in the same context (so we > > could use the srcu lock that works underneath drm_dev_enter/exit), since > > at least right now kernel vmaps of dma-buf are very long-lived. > > > If by same context you mean the right drm_device (the exporter's one) > then this should be ok as I am seeing from amdgpu implementation > of the callback - amdgpu_dma_buf_begin_cpu_access. We just need to add > handler for .end_cpu_access callback to call drm_dev_exit there. Same context = same system call essentially. You cannot hold locks while returning to userspace. And current userspace can call the begin/end_cpu_access callbacks through ioctls, so just putting a drm_dev_enter/exit in them will break really badly. Iirc there's an igt also for testing these ioctl - if there isn't we really should have one. Hence why we need to be more careful here about how's calling and where we can put the drm_dev_enter/exit. -Daniel > > Andrey > > > > > > But the good news is that Thomas Zimmerman is working on this problem > > already for different reasons, so it might be that we won't have any > > long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in > > there. > > > > > > > Oh very very good point! I haven't thought about DMA-buf mmaps in this > > > > > context yet. > > > > > > > > > > > > > > > > btw the other issue with dma-buf (and even worse with dma_fence) is > > > > > > refcounting of the underlying drm_device. I'd expect that all your > > > > > > callbacks go boom if the dma_buf outlives your drm_device. That part isn't > > > > > > yet solved in your series here. > > > > > Well thinking more about this, it seems to be a another really good argument > > > > > why mapping pages from DMA-bufs into application address space directly is a > > > > > very bad idea :) > > > > > > > > > > But yes, we essentially can't remove the device as long as there is a > > > > > DMA-buf with mappings. No idea how to clean that one up. > > > > drm_dev_get/put in drm_prime helpers should get us like 90% there I think. > > > > > > What are the other 10% ? > > dma_fence, which is also about 90% of the work probably. But I'm > > guesstimating only 10% of the oopses you can hit. Since generally the > > dma_fence for a buffer don't outlive the underlying buffer. So usually no > > problems happen when we've solved the dma-buf sharing, but the dma_fence > > can outlive the dma-buf, so there's still possibilities of crashing. > > > > > > The even more worrying thing is random dma_fence attached to the dma_resv > > > > object. We could try to clean all of ours up, but they could have escaped > > > > already into some other driver. And since we're talking about egpu > > > > hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case. > > > > > > > > I have no how to fix that one :-/ > > > > -Daniel > > > > > > I assume you are referring to sync_file_create/sync_file_get_fence API for > > > dma_fence export/import ? > > So dma_fence is a general issue, there's a pile of interfaces that result > > in sharing with other drivers: > > - dma_resv in the dma_buf > > - sync_file > > - drm_syncobj (but I think that's not yet cross driver, but probably > > changes) > > > > In each of these cases drivers can pick up the dma_fence and use it > > internally for all kinds of purposes (could end up in the scheduler or > > wherever). > > > > > So with DMA bufs we have the drm_gem_object as exporter specific private data > > > and so we can do drm_dev_get and put at the drm_gem_object layer to bind > > > device life cycle > > > to that of each GEM object but, we don't have such mid-layer for dma_fence > > > which could allow > > > us to increment device reference for each fence out there related to that > > > device - is my understanding correct ? > > Yeah that's the annoying part with dma-fence. No existing generic place to > > put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the > > other problems first. > > -Daniel > > > > > Andrey > > > > > > > > > Andrey > > > > > > > > > > > Christian. > > > > > > > > > > > -Daniel > > > > > > > > > > > > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ? > > > > > > > > Is there other places ? > > > > > > > Puh, good question. I have no idea. > > > > > > > > > > > > > > > Another point is that at this point the driver shouldn't access any such > > > > > > > > buffers as we are at the process finishing the device. > > > > > > > > AFAIK there is no page fault mechanism for kernel mappings so I don't > > > > > > > > think there is anything else to do ? > > > > > > > Well there is a page fault handler for kernel mappings, but that one just > > > > > > > prints the stack trace into the system log and calls BUG(); :) > > > > > > > > > > > > > > Long story short we need to avoid any access to released pages after unplug. > > > > > > > No matter if it's from the kernel or userspace. > > > > > > > > > > > > > > Regards, > > > > > > > Christian. > > > > > > > > > > > > > > > Andrey
On 11/24/20 11:44 AM, Christian König wrote: > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: >> >> On 11/24/20 2:41 AM, Christian König wrote: >>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: >>>> >>>> On 11/23/20 3:41 PM, Christian König wrote: >>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: >>>>>> >>>>>> On 11/23/20 3:20 PM, Christian König wrote: >>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>>>>>>> >>>>>>>> On 11/25/20 5:42 AM, Christian König wrote: >>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>>>>>> It's needed to drop iommu backed pages on device unplug >>>>>>>>>> before device's IOMMU group is released. >>>>>>>>> >>>>>>>>> It would be cleaner if we could do the whole handling in TTM. I also >>>>>>>>> need to double check what you are doing with this function. >>>>>>>>> >>>>>>>>> Christian. >>>>>>>> >>>>>>>> >>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." >>>>>>>> to see >>>>>>>> how i use it. I don't see why this should go into TTM mid-layer - the >>>>>>>> stuff I do inside >>>>>>>> is vendor specific and also I don't think TTM is explicitly aware of >>>>>>>> IOMMU ? >>>>>>>> Do you mean you prefer the IOMMU notifier to be registered from within TTM >>>>>>>> and then use a hook to call into vendor specific handler ? >>>>>>> >>>>>>> No, that is really vendor specific. >>>>>>> >>>>>>> What I meant is to have a function like ttm_resource_manager_evict_all() >>>>>>> which you only need to call and all tt objects are unpopulated. >>>>>> >>>>>> >>>>>> So instead of this BO list i create and later iterate in amdgpu from the >>>>>> IOMMU patch you just want to do it within >>>>>> TTM with a single function ? Makes much more sense. >>>>> >>>>> Yes, exactly. >>>>> >>>>> The list_empty() checks we have in TTM for the LRU are actually not the >>>>> best idea, we should now check the pin_count instead. This way we could >>>>> also have a list of the pinned BOs in TTM. >>>> >>>> >>>> So from my IOMMU topology handler I will iterate the TTM LRU for the >>>> unpinned BOs and this new function for the pinned ones ? >>>> It's probably a good idea to combine both iterations into this new function >>>> to cover all the BOs allocated on the device. >>> >>> Yes, that's what I had in my mind as well. >>> >>>> >>>> >>>>> >>>>> BTW: Have you thought about what happens when we unpopulate a BO while we >>>>> still try to use a kernel mapping for it? That could have unforeseen >>>>> consequences. >>>> >>>> >>>> Are you asking what happens to kmap or vmap style mapped CPU accesses once >>>> we drop all the DMA backing pages for a particular BO ? Because for user >>>> mappings >>>> (mmap) we took care of this with dummy page reroute but indeed nothing was >>>> done for in kernel CPU mappings. >>> >>> Yes exactly that. >>> >>> In other words what happens if we free the ring buffer while the kernel >>> still writes to it? >>> >>> Christian. >> >> >> While we can't control user application accesses to the mapped buffers >> explicitly and hence we use page fault rerouting >> I am thinking that in this case we may be able to sprinkle >> drm_dev_enter/exit in any such sensitive place were we might >> CPU access a DMA buffer from the kernel ? > > Yes, I fear we are going to need that. > >> Things like CPU page table updates, ring buffer accesses and FW memcpy ? Is >> there other places ? > > Puh, good question. I have no idea. > >> Another point is that at this point the driver shouldn't access any such >> buffers as we are at the process finishing the device. >> AFAIK there is no page fault mechanism for kernel mappings so I don't think >> there is anything else to do ? > > Well there is a page fault handler for kernel mappings, but that one just > prints the stack trace into the system log and calls BUG(); :) > > Long story short we need to avoid any access to released pages after unplug. > No matter if it's from the kernel or userspace. I was just about to start guarding with drm_dev_enter/exit CPU accesses from kernel to GTT ot VRAM buffers but then i looked more in the code and seems like ttm_tt_unpopulate just deletes DMA mappings (for the sake of device to main memory access). Kernel page table is not touched until last bo refcount is dropped and the bo is released (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This is both for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped by ioremap. So as i see it, nothing will bad will happen after we unpopulate a BO while we still try to use a kernel mapping for it, system memory pages backing GTT BOs are still mapped and not freed and for VRAM BOs same is for the IO physical ranges mapped into the kernel page table since iounmap wasn't called yet. I loaded the driver with vm_update_mode=3 meaning all VM updates done using CPU and hasn't seen any OOPs after removing the device. I guess i can test it more by allocating GTT and VRAM BOs and trying to read/write to them after device is removed. Andrey > > Regards, > Christian. > >> >> Andrey > >
Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: > [SNIP] >>> >>> While we can't control user application accesses to the mapped >>> buffers explicitly and hence we use page fault rerouting >>> I am thinking that in this case we may be able to sprinkle >>> drm_dev_enter/exit in any such sensitive place were we might >>> CPU access a DMA buffer from the kernel ? >> >> Yes, I fear we are going to need that. >> >>> Things like CPU page table updates, ring buffer accesses and FW >>> memcpy ? Is there other places ? >> >> Puh, good question. I have no idea. >> >>> Another point is that at this point the driver shouldn't access any >>> such buffers as we are at the process finishing the device. >>> AFAIK there is no page fault mechanism for kernel mappings so I >>> don't think there is anything else to do ? >> >> Well there is a page fault handler for kernel mappings, but that one >> just prints the stack trace into the system log and calls BUG(); :) >> >> Long story short we need to avoid any access to released pages after >> unplug. No matter if it's from the kernel or userspace. > > > I was just about to start guarding with drm_dev_enter/exit CPU > accesses from kernel to GTT ot VRAM buffers but then i looked more in > the code > and seems like ttm_tt_unpopulate just deletes DMA mappings (for the > sake of device to main memory access). Kernel page table is not touched > until last bo refcount is dropped and the bo is released > (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This > is both > for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped > by ioremap. So as i see it, nothing will bad will happen after we > unpopulate a BO while we still try to use a kernel mapping for it, > system memory pages backing GTT BOs are still mapped and not freed and > for > VRAM BOs same is for the IO physical ranges mapped into the kernel > page table since iounmap wasn't called yet. The problem is the system pages would be freed and if we kernel driver still happily write to them we are pretty much busted because we write to freed up memory. Christian. > I loaded the driver with vm_update_mode=3 > meaning all VM updates done using CPU and hasn't seen any OOPs after > removing the device. I guess i can test it more by allocating GTT and > VRAM BOs > and trying to read/write to them after device is removed. > > Andrey > > >> >> Regards, >> Christian. >> >>> >>> Andrey >> >> > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
On Wed, Dec 16, 2020 at 9:04 AM Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > > Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: > > [SNIP] > >>> > >>> While we can't control user application accesses to the mapped > >>> buffers explicitly and hence we use page fault rerouting > >>> I am thinking that in this case we may be able to sprinkle > >>> drm_dev_enter/exit in any such sensitive place were we might > >>> CPU access a DMA buffer from the kernel ? > >> > >> Yes, I fear we are going to need that. > >> > >>> Things like CPU page table updates, ring buffer accesses and FW > >>> memcpy ? Is there other places ? > >> > >> Puh, good question. I have no idea. > >> > >>> Another point is that at this point the driver shouldn't access any > >>> such buffers as we are at the process finishing the device. > >>> AFAIK there is no page fault mechanism for kernel mappings so I > >>> don't think there is anything else to do ? > >> > >> Well there is a page fault handler for kernel mappings, but that one > >> just prints the stack trace into the system log and calls BUG(); :) > >> > >> Long story short we need to avoid any access to released pages after > >> unplug. No matter if it's from the kernel or userspace. > > > > > > I was just about to start guarding with drm_dev_enter/exit CPU > > accesses from kernel to GTT ot VRAM buffers but then i looked more in > > the code > > and seems like ttm_tt_unpopulate just deletes DMA mappings (for the > > sake of device to main memory access). Kernel page table is not touched > > until last bo refcount is dropped and the bo is released > > (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This > > is both > > for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped > > by ioremap. So as i see it, nothing will bad will happen after we > > unpopulate a BO while we still try to use a kernel mapping for it, > > system memory pages backing GTT BOs are still mapped and not freed and > > for > > VRAM BOs same is for the IO physical ranges mapped into the kernel > > page table since iounmap wasn't called yet. > > The problem is the system pages would be freed and if we kernel driver > still happily write to them we are pretty much busted because we write > to freed up memory. Similar for vram, if this is actual hotunplug and then replug, there's going to be a different device behind the same mmio bar range most likely (the higher bridges all this have the same windows assigned), and that's bad news if we keep using it for current drivers. So we really have to point all these cpu ptes to some other place. -Daniel > > Christian. > > > I loaded the driver with vm_update_mode=3 > > meaning all VM updates done using CPU and hasn't seen any OOPs after > > removing the device. I guess i can test it more by allocating GTT and > > VRAM BOs > > and trying to read/write to them after device is removed. > > > > Andrey > > > > > >> > >> Regards, > >> Christian. > >> > >>> > >>> Andrey > >> > >> > > _______________________________________________ > > amd-gfx mailing list > > amd-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx >
On 12/16/20 9:21 AM, Daniel Vetter wrote: > On Wed, Dec 16, 2020 at 9:04 AM Christian König > <ckoenig.leichtzumerken@gmail.com> wrote: >> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: >>> [SNIP] >>>>> While we can't control user application accesses to the mapped >>>>> buffers explicitly and hence we use page fault rerouting >>>>> I am thinking that in this case we may be able to sprinkle >>>>> drm_dev_enter/exit in any such sensitive place were we might >>>>> CPU access a DMA buffer from the kernel ? >>>> Yes, I fear we are going to need that. >>>> >>>>> Things like CPU page table updates, ring buffer accesses and FW >>>>> memcpy ? Is there other places ? >>>> Puh, good question. I have no idea. >>>> >>>>> Another point is that at this point the driver shouldn't access any >>>>> such buffers as we are at the process finishing the device. >>>>> AFAIK there is no page fault mechanism for kernel mappings so I >>>>> don't think there is anything else to do ? >>>> Well there is a page fault handler for kernel mappings, but that one >>>> just prints the stack trace into the system log and calls BUG(); :) >>>> >>>> Long story short we need to avoid any access to released pages after >>>> unplug. No matter if it's from the kernel or userspace. >>> >>> I was just about to start guarding with drm_dev_enter/exit CPU >>> accesses from kernel to GTT ot VRAM buffers but then i looked more in >>> the code >>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the >>> sake of device to main memory access). Kernel page table is not touched >>> until last bo refcount is dropped and the bo is released >>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This >>> is both >>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped >>> by ioremap. So as i see it, nothing will bad will happen after we >>> unpopulate a BO while we still try to use a kernel mapping for it, >>> system memory pages backing GTT BOs are still mapped and not freed and >>> for >>> VRAM BOs same is for the IO physical ranges mapped into the kernel >>> page table since iounmap wasn't called yet. >> The problem is the system pages would be freed and if we kernel driver >> still happily write to them we are pretty much busted because we write >> to freed up memory. OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will release the GTT BO pages. But then isn't there a problem in ttm_bo_release since ttm_bo_cleanup_memtype_use which also leads to pages release comes before bo->destroy which unmaps the pages from kernel page table ? Won't we have end up writing to freed memory in this time interval ? Don't we need to postpone pages freeing to after kernel page table unmapping ? > Similar for vram, if this is actual hotunplug and then replug, there's > going to be a different device behind the same mmio bar range most > likely (the higher bridges all this have the same windows assigned), No idea how this actually works but if we haven't called iounmap yet doesn't it mean that those physical ranges that are still mapped into page table should be reserved and cannot be reused for another device ? As a guess, maybe another subrange from the higher bridge's total range will be allocated. > and that's bad news if we keep using it for current drivers. So we > really have to point all these cpu ptes to some other place. We can't just unmap it without syncing against any in kernel accesses to those buffers and since page faulting technique we use for user mapped buffers seems to not be possible for kernel mapped buffers I am not sure how to do it gracefully... Andrey > -Daniel > >> Christian. >> >>> I loaded the driver with vm_update_mode=3 >>> meaning all VM updates done using CPU and hasn't seen any OOPs after >>> removing the device. I guess i can test it more by allocating GTT and >>> VRAM BOs >>> and trying to read/write to them after device is removed. >>> >>> Andrey >>> >>> >>>> Regards, >>>> Christian. >>>> >>>>> Andrey >>>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&reserved=0 >
Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: > > On 12/16/20 9:21 AM, Daniel Vetter wrote: >> On Wed, Dec 16, 2020 at 9:04 AM Christian König >> <ckoenig.leichtzumerken@gmail.com> wrote: >>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: >>>> [SNIP] >>>>>> While we can't control user application accesses to the mapped >>>>>> buffers explicitly and hence we use page fault rerouting >>>>>> I am thinking that in this case we may be able to sprinkle >>>>>> drm_dev_enter/exit in any such sensitive place were we might >>>>>> CPU access a DMA buffer from the kernel ? >>>>> Yes, I fear we are going to need that. >>>>> >>>>>> Things like CPU page table updates, ring buffer accesses and FW >>>>>> memcpy ? Is there other places ? >>>>> Puh, good question. I have no idea. >>>>> >>>>>> Another point is that at this point the driver shouldn't access any >>>>>> such buffers as we are at the process finishing the device. >>>>>> AFAIK there is no page fault mechanism for kernel mappings so I >>>>>> don't think there is anything else to do ? >>>>> Well there is a page fault handler for kernel mappings, but that one >>>>> just prints the stack trace into the system log and calls BUG(); :) >>>>> >>>>> Long story short we need to avoid any access to released pages after >>>>> unplug. No matter if it's from the kernel or userspace. >>>> >>>> I was just about to start guarding with drm_dev_enter/exit CPU >>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in >>>> the code >>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the >>>> sake of device to main memory access). Kernel page table is not >>>> touched >>>> until last bo refcount is dropped and the bo is released >>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This >>>> is both >>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped >>>> by ioremap. So as i see it, nothing will bad will happen after we >>>> unpopulate a BO while we still try to use a kernel mapping for it, >>>> system memory pages backing GTT BOs are still mapped and not freed and >>>> for >>>> VRAM BOs same is for the IO physical ranges mapped into the kernel >>>> page table since iounmap wasn't called yet. >>> The problem is the system pages would be freed and if we kernel driver >>> still happily write to them we are pretty much busted because we write >>> to freed up memory. > > > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will > release > the GTT BO pages. But then isn't there a problem in ttm_bo_release since > ttm_bo_cleanup_memtype_use which also leads to pages release comes > before bo->destroy which unmaps the pages from kernel page table ? Won't > we have end up writing to freed memory in this time interval ? Don't we > need to postpone pages freeing to after kernel page table unmapping ? BOs are only destroyed when there is a guarantee that nobody is accessing them any more. The problem here is that the pages as well as the VRAM can be immediately reused after the hotplug event. > > >> Similar for vram, if this is actual hotunplug and then replug, there's >> going to be a different device behind the same mmio bar range most >> likely (the higher bridges all this have the same windows assigned), > > > No idea how this actually works but if we haven't called iounmap yet > doesn't it mean that those physical ranges that are still mapped into > page > table should be reserved and cannot be reused for another > device ? As a guess, maybe another subrange from the higher bridge's > total > range will be allocated. Nope, the PCIe subsystem doesn't care about any ioremap still active for a range when it is hotplugged. > >> and that's bad news if we keep using it for current drivers. So we >> really have to point all these cpu ptes to some other place. > > > We can't just unmap it without syncing against any in kernel accesses > to those buffers > and since page faulting technique we use for user mapped buffers seems > to not be possible > for kernel mapped buffers I am not sure how to do it gracefully... We could try to replace the kmap with a dummy page under the hood, but that is extremely tricky. Especially since BOs which are just 1 page in size could point to the linear mapping directly. Christian. > > Andrey > > >> -Daniel >> >>> Christian. >>> >>>> I loaded the driver with vm_update_mode=3 >>>> meaning all VM updates done using CPU and hasn't seen any OOPs after >>>> removing the device. I guess i can test it more by allocating GTT and >>>> VRAM BOs >>>> and trying to read/write to them after device is removed. >>>> >>>> Andrey >>>> >>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> Andrey >>>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@lists.freedesktop.org >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&reserved=0 >>>> >>
On Wed, Dec 16, 2020 at 5:18 PM Christian König <christian.koenig@amd.com> wrote: > > Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: > > > > On 12/16/20 9:21 AM, Daniel Vetter wrote: > >> On Wed, Dec 16, 2020 at 9:04 AM Christian König > >> <ckoenig.leichtzumerken@gmail.com> wrote: > >>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: > >>>> [SNIP] > >>>>>> While we can't control user application accesses to the mapped > >>>>>> buffers explicitly and hence we use page fault rerouting > >>>>>> I am thinking that in this case we may be able to sprinkle > >>>>>> drm_dev_enter/exit in any such sensitive place were we might > >>>>>> CPU access a DMA buffer from the kernel ? > >>>>> Yes, I fear we are going to need that. > >>>>> > >>>>>> Things like CPU page table updates, ring buffer accesses and FW > >>>>>> memcpy ? Is there other places ? > >>>>> Puh, good question. I have no idea. > >>>>> > >>>>>> Another point is that at this point the driver shouldn't access any > >>>>>> such buffers as we are at the process finishing the device. > >>>>>> AFAIK there is no page fault mechanism for kernel mappings so I > >>>>>> don't think there is anything else to do ? > >>>>> Well there is a page fault handler for kernel mappings, but that one > >>>>> just prints the stack trace into the system log and calls BUG(); :) > >>>>> > >>>>> Long story short we need to avoid any access to released pages after > >>>>> unplug. No matter if it's from the kernel or userspace. > >>>> > >>>> I was just about to start guarding with drm_dev_enter/exit CPU > >>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in > >>>> the code > >>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the > >>>> sake of device to main memory access). Kernel page table is not > >>>> touched > >>>> until last bo refcount is dropped and the bo is released > >>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This > >>>> is both > >>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped > >>>> by ioremap. So as i see it, nothing will bad will happen after we > >>>> unpopulate a BO while we still try to use a kernel mapping for it, > >>>> system memory pages backing GTT BOs are still mapped and not freed and > >>>> for > >>>> VRAM BOs same is for the IO physical ranges mapped into the kernel > >>>> page table since iounmap wasn't called yet. > >>> The problem is the system pages would be freed and if we kernel driver > >>> still happily write to them we are pretty much busted because we write > >>> to freed up memory. > > > > > > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will > > release > > the GTT BO pages. But then isn't there a problem in ttm_bo_release since > > ttm_bo_cleanup_memtype_use which also leads to pages release comes > > before bo->destroy which unmaps the pages from kernel page table ? Won't > > we have end up writing to freed memory in this time interval ? Don't we > > need to postpone pages freeing to after kernel page table unmapping ? > > BOs are only destroyed when there is a guarantee that nobody is > accessing them any more. > > The problem here is that the pages as well as the VRAM can be > immediately reused after the hotplug event. > > > > > > >> Similar for vram, if this is actual hotunplug and then replug, there's > >> going to be a different device behind the same mmio bar range most > >> likely (the higher bridges all this have the same windows assigned), > > > > > > No idea how this actually works but if we haven't called iounmap yet > > doesn't it mean that those physical ranges that are still mapped into > > page > > table should be reserved and cannot be reused for another > > device ? As a guess, maybe another subrange from the higher bridge's > > total > > range will be allocated. > > Nope, the PCIe subsystem doesn't care about any ioremap still active for > a range when it is hotplugged. > > > > >> and that's bad news if we keep using it for current drivers. So we > >> really have to point all these cpu ptes to some other place. > > > > > > We can't just unmap it without syncing against any in kernel accesses > > to those buffers > > and since page faulting technique we use for user mapped buffers seems > > to not be possible > > for kernel mapped buffers I am not sure how to do it gracefully... > > We could try to replace the kmap with a dummy page under the hood, but > that is extremely tricky. > > Especially since BOs which are just 1 page in size could point to the > linear mapping directly. I think it's just more work. Essentially - convert as much as possible of the kernel mappings to vmap_local, which Thomas Zimmermann is rolling out. That way a dma_resv_lock will serve as a barrier, and ofc any new vmap needs to fail or hand out a dummy mapping. - handle fbcon somehow. I think shutting it all down should work out. - worst case keep the system backing storage around for shared dma-buf until the other non-dynamic driver releases it. for vram we require dynamic importers (and maybe it wasn't such a bright idea to allow pinning of importer buffers, might need to revisit that). Cheers, Daniel > > Christian. > > > > > Andrey > > > > > >> -Daniel > >> > >>> Christian. > >>> > >>>> I loaded the driver with vm_update_mode=3 > >>>> meaning all VM updates done using CPU and hasn't seen any OOPs after > >>>> removing the device. I guess i can test it more by allocating GTT and > >>>> VRAM BOs > >>>> and trying to read/write to them after device is removed. > >>>> > >>>> Andrey > >>>> > >>>> > >>>>> Regards, > >>>>> Christian. > >>>>> > >>>>>> Andrey > >>>>> > >>>> _______________________________________________ > >>>> amd-gfx mailing list > >>>> amd-gfx@lists.freedesktop.org > >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&reserved=0 > >>>> > >> >
On Wed, Dec 16, 2020 at 6:12 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Wed, Dec 16, 2020 at 5:18 PM Christian König > <christian.koenig@amd.com> wrote: > > > > Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: > > > > > > On 12/16/20 9:21 AM, Daniel Vetter wrote: > > >> On Wed, Dec 16, 2020 at 9:04 AM Christian König > > >> <ckoenig.leichtzumerken@gmail.com> wrote: > > >>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: > > >>>> [SNIP] > > >>>>>> While we can't control user application accesses to the mapped > > >>>>>> buffers explicitly and hence we use page fault rerouting > > >>>>>> I am thinking that in this case we may be able to sprinkle > > >>>>>> drm_dev_enter/exit in any such sensitive place were we might > > >>>>>> CPU access a DMA buffer from the kernel ? > > >>>>> Yes, I fear we are going to need that. > > >>>>> > > >>>>>> Things like CPU page table updates, ring buffer accesses and FW > > >>>>>> memcpy ? Is there other places ? > > >>>>> Puh, good question. I have no idea. > > >>>>> > > >>>>>> Another point is that at this point the driver shouldn't access any > > >>>>>> such buffers as we are at the process finishing the device. > > >>>>>> AFAIK there is no page fault mechanism for kernel mappings so I > > >>>>>> don't think there is anything else to do ? > > >>>>> Well there is a page fault handler for kernel mappings, but that one > > >>>>> just prints the stack trace into the system log and calls BUG(); :) > > >>>>> > > >>>>> Long story short we need to avoid any access to released pages after > > >>>>> unplug. No matter if it's from the kernel or userspace. > > >>>> > > >>>> I was just about to start guarding with drm_dev_enter/exit CPU > > >>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in > > >>>> the code > > >>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the > > >>>> sake of device to main memory access). Kernel page table is not > > >>>> touched > > >>>> until last bo refcount is dropped and the bo is released > > >>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This > > >>>> is both > > >>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped > > >>>> by ioremap. So as i see it, nothing will bad will happen after we > > >>>> unpopulate a BO while we still try to use a kernel mapping for it, > > >>>> system memory pages backing GTT BOs are still mapped and not freed and > > >>>> for > > >>>> VRAM BOs same is for the IO physical ranges mapped into the kernel > > >>>> page table since iounmap wasn't called yet. > > >>> The problem is the system pages would be freed and if we kernel driver > > >>> still happily write to them we are pretty much busted because we write > > >>> to freed up memory. > > > > > > > > > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will > > > release > > > the GTT BO pages. But then isn't there a problem in ttm_bo_release since > > > ttm_bo_cleanup_memtype_use which also leads to pages release comes > > > before bo->destroy which unmaps the pages from kernel page table ? Won't > > > we have end up writing to freed memory in this time interval ? Don't we > > > need to postpone pages freeing to after kernel page table unmapping ? > > > > BOs are only destroyed when there is a guarantee that nobody is > > accessing them any more. > > > > The problem here is that the pages as well as the VRAM can be > > immediately reused after the hotplug event. > > > > > > > > > > >> Similar for vram, if this is actual hotunplug and then replug, there's > > >> going to be a different device behind the same mmio bar range most > > >> likely (the higher bridges all this have the same windows assigned), > > > > > > > > > No idea how this actually works but if we haven't called iounmap yet > > > doesn't it mean that those physical ranges that are still mapped into > > > page > > > table should be reserved and cannot be reused for another > > > device ? As a guess, maybe another subrange from the higher bridge's > > > total > > > range will be allocated. > > > > Nope, the PCIe subsystem doesn't care about any ioremap still active for > > a range when it is hotplugged. > > > > > > > >> and that's bad news if we keep using it for current drivers. So we > > >> really have to point all these cpu ptes to some other place. > > > > > > > > > We can't just unmap it without syncing against any in kernel accesses > > > to those buffers > > > and since page faulting technique we use for user mapped buffers seems > > > to not be possible > > > for kernel mapped buffers I am not sure how to do it gracefully... > > > > We could try to replace the kmap with a dummy page under the hood, but > > that is extremely tricky. > > > > Especially since BOs which are just 1 page in size could point to the > > linear mapping directly. > > I think it's just more work. Essentially > - convert as much as possible of the kernel mappings to vmap_local, > which Thomas Zimmermann is rolling out. That way a dma_resv_lock will > serve as a barrier, and ofc any new vmap needs to fail or hand out a > dummy mapping. > - handle fbcon somehow. I think shutting it all down should work out. Oh also for fbdev I think best to switch over to drm_fbdev_generic_setup(). That should handle all the lifetime fun correctly already, minus the vram complication. So at least fewer oopses for other reasons :-) -Daniel > - worst case keep the system backing storage around for shared dma-buf > until the other non-dynamic driver releases it. for vram we require > dynamic importers (and maybe it wasn't such a bright idea to allow > pinning of importer buffers, might need to revisit that). > > Cheers, Daniel > > > > > Christian. > > > > > > > > Andrey > > > > > > > > >> -Daniel > > >> > > >>> Christian. > > >>> > > >>>> I loaded the driver with vm_update_mode=3 > > >>>> meaning all VM updates done using CPU and hasn't seen any OOPs after > > >>>> removing the device. I guess i can test it more by allocating GTT and > > >>>> VRAM BOs > > >>>> and trying to read/write to them after device is removed. > > >>>> > > >>>> Andrey > > >>>> > > >>>> > > >>>>> Regards, > > >>>>> Christian. > > >>>>> > > >>>>>> Andrey > > >>>>> > > >>>> _______________________________________________ > > >>>> amd-gfx mailing list > > >>>> amd-gfx@lists.freedesktop.org > > >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&reserved=0 > > >>>> > > >> > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On 12/16/20 12:12 PM, Daniel Vetter wrote: > On Wed, Dec 16, 2020 at 5:18 PM Christian König > <christian.koenig@amd.com> wrote: >> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: >>> On 12/16/20 9:21 AM, Daniel Vetter wrote: >>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König >>>> <ckoenig.leichtzumerken@gmail.com> wrote: >>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: >>>>>> [SNIP] >>>>>>>> While we can't control user application accesses to the mapped >>>>>>>> buffers explicitly and hence we use page fault rerouting >>>>>>>> I am thinking that in this case we may be able to sprinkle >>>>>>>> drm_dev_enter/exit in any such sensitive place were we might >>>>>>>> CPU access a DMA buffer from the kernel ? >>>>>>> Yes, I fear we are going to need that. >>>>>>> >>>>>>>> Things like CPU page table updates, ring buffer accesses and FW >>>>>>>> memcpy ? Is there other places ? >>>>>>> Puh, good question. I have no idea. >>>>>>> >>>>>>>> Another point is that at this point the driver shouldn't access any >>>>>>>> such buffers as we are at the process finishing the device. >>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I >>>>>>>> don't think there is anything else to do ? >>>>>>> Well there is a page fault handler for kernel mappings, but that one >>>>>>> just prints the stack trace into the system log and calls BUG(); :) >>>>>>> >>>>>>> Long story short we need to avoid any access to released pages after >>>>>>> unplug. No matter if it's from the kernel or userspace. >>>>>> I was just about to start guarding with drm_dev_enter/exit CPU >>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in >>>>>> the code >>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the >>>>>> sake of device to main memory access). Kernel page table is not >>>>>> touched >>>>>> until last bo refcount is dropped and the bo is released >>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This >>>>>> is both >>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped >>>>>> by ioremap. So as i see it, nothing will bad will happen after we >>>>>> unpopulate a BO while we still try to use a kernel mapping for it, >>>>>> system memory pages backing GTT BOs are still mapped and not freed and >>>>>> for >>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel >>>>>> page table since iounmap wasn't called yet. >>>>> The problem is the system pages would be freed and if we kernel driver >>>>> still happily write to them we are pretty much busted because we write >>>>> to freed up memory. >>> >>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will >>> release >>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since >>> ttm_bo_cleanup_memtype_use which also leads to pages release comes >>> before bo->destroy which unmaps the pages from kernel page table ? Won't >>> we have end up writing to freed memory in this time interval ? Don't we >>> need to postpone pages freeing to after kernel page table unmapping ? >> BOs are only destroyed when there is a guarantee that nobody is >> accessing them any more. >> >> The problem here is that the pages as well as the VRAM can be >> immediately reused after the hotplug event. >> >>> >>>> Similar for vram, if this is actual hotunplug and then replug, there's >>>> going to be a different device behind the same mmio bar range most >>>> likely (the higher bridges all this have the same windows assigned), >>> >>> No idea how this actually works but if we haven't called iounmap yet >>> doesn't it mean that those physical ranges that are still mapped into >>> page >>> table should be reserved and cannot be reused for another >>> device ? As a guess, maybe another subrange from the higher bridge's >>> total >>> range will be allocated. >> Nope, the PCIe subsystem doesn't care about any ioremap still active for >> a range when it is hotplugged. >> >>>> and that's bad news if we keep using it for current drivers. So we >>>> really have to point all these cpu ptes to some other place. >>> >>> We can't just unmap it without syncing against any in kernel accesses >>> to those buffers >>> and since page faulting technique we use for user mapped buffers seems >>> to not be possible >>> for kernel mapped buffers I am not sure how to do it gracefully... >> We could try to replace the kmap with a dummy page under the hood, but >> that is extremely tricky. >> >> Especially since BOs which are just 1 page in size could point to the >> linear mapping directly. > I think it's just more work. Essentially > - convert as much as possible of the kernel mappings to vmap_local, > which Thomas Zimmermann is rolling out. That way a dma_resv_lock will > serve as a barrier, and ofc any new vmap needs to fail or hand out a > dummy mapping. Read those patches. I am not sure how this helps with protecting against accesses to released backing pages or IO physical ranges of BO which is already mapped during the unplug event ? Andrey > - handle fbcon somehow. I think shutting it all down should work out. > - worst case keep the system backing storage around for shared dma-buf > until the other non-dynamic driver releases it. for vram we require > dynamic importers (and maybe it wasn't such a bright idea to allow > pinning of importer buffers, might need to revisit that). > > Cheers, Daniel > >> Christian. >> >>> Andrey >>> >>> >>>> -Daniel >>>> >>>>> Christian. >>>>> >>>>>> I loaded the driver with vm_update_mode=3 >>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after >>>>>> removing the device. I guess i can test it more by allocating GTT and >>>>>> VRAM BOs >>>>>> and trying to read/write to them after device is removed. >>>>>> >>>>>> Andrey >>>>>> >>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>>> Andrey >>>>>> _______________________________________________ >>>>>> amd-gfx mailing list >>>>>> amd-gfx@lists.freedesktop.org >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C37b4367cbdaa4133b01d08d8a1e5bf41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437355430007196%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=r0bIJS3HUDkFPqyFinAt4eahM%2BjF01DObZ5abgstzSU%3D&reserved=0 >>>>>> >
On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> wrote: > > > On 12/16/20 12:12 PM, Daniel Vetter wrote: > > On Wed, Dec 16, 2020 at 5:18 PM Christian König > > <christian.koenig@amd.com> wrote: > >> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: > >>> On 12/16/20 9:21 AM, Daniel Vetter wrote: > >>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König > >>>> <ckoenig.leichtzumerken@gmail.com> wrote: > >>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: > >>>>>> [SNIP] > >>>>>>>> While we can't control user application accesses to the mapped > >>>>>>>> buffers explicitly and hence we use page fault rerouting > >>>>>>>> I am thinking that in this case we may be able to sprinkle > >>>>>>>> drm_dev_enter/exit in any such sensitive place were we might > >>>>>>>> CPU access a DMA buffer from the kernel ? > >>>>>>> Yes, I fear we are going to need that. > >>>>>>> > >>>>>>>> Things like CPU page table updates, ring buffer accesses and FW > >>>>>>>> memcpy ? Is there other places ? > >>>>>>> Puh, good question. I have no idea. > >>>>>>> > >>>>>>>> Another point is that at this point the driver shouldn't access any > >>>>>>>> such buffers as we are at the process finishing the device. > >>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I > >>>>>>>> don't think there is anything else to do ? > >>>>>>> Well there is a page fault handler for kernel mappings, but that one > >>>>>>> just prints the stack trace into the system log and calls BUG(); :) > >>>>>>> > >>>>>>> Long story short we need to avoid any access to released pages after > >>>>>>> unplug. No matter if it's from the kernel or userspace. > >>>>>> I was just about to start guarding with drm_dev_enter/exit CPU > >>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in > >>>>>> the code > >>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the > >>>>>> sake of device to main memory access). Kernel page table is not > >>>>>> touched > >>>>>> until last bo refcount is dropped and the bo is released > >>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This > >>>>>> is both > >>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped > >>>>>> by ioremap. So as i see it, nothing will bad will happen after we > >>>>>> unpopulate a BO while we still try to use a kernel mapping for it, > >>>>>> system memory pages backing GTT BOs are still mapped and not freed and > >>>>>> for > >>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel > >>>>>> page table since iounmap wasn't called yet. > >>>>> The problem is the system pages would be freed and if we kernel driver > >>>>> still happily write to them we are pretty much busted because we write > >>>>> to freed up memory. > >>> > >>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will > >>> release > >>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since > >>> ttm_bo_cleanup_memtype_use which also leads to pages release comes > >>> before bo->destroy which unmaps the pages from kernel page table ? Won't > >>> we have end up writing to freed memory in this time interval ? Don't we > >>> need to postpone pages freeing to after kernel page table unmapping ? > >> BOs are only destroyed when there is a guarantee that nobody is > >> accessing them any more. > >> > >> The problem here is that the pages as well as the VRAM can be > >> immediately reused after the hotplug event. > >> > >>> > >>>> Similar for vram, if this is actual hotunplug and then replug, there's > >>>> going to be a different device behind the same mmio bar range most > >>>> likely (the higher bridges all this have the same windows assigned), > >>> > >>> No idea how this actually works but if we haven't called iounmap yet > >>> doesn't it mean that those physical ranges that are still mapped into > >>> page > >>> table should be reserved and cannot be reused for another > >>> device ? As a guess, maybe another subrange from the higher bridge's > >>> total > >>> range will be allocated. > >> Nope, the PCIe subsystem doesn't care about any ioremap still active for > >> a range when it is hotplugged. > >> > >>>> and that's bad news if we keep using it for current drivers. So we > >>>> really have to point all these cpu ptes to some other place. > >>> > >>> We can't just unmap it without syncing against any in kernel accesses > >>> to those buffers > >>> and since page faulting technique we use for user mapped buffers seems > >>> to not be possible > >>> for kernel mapped buffers I am not sure how to do it gracefully... > >> We could try to replace the kmap with a dummy page under the hood, but > >> that is extremely tricky. > >> > >> Especially since BOs which are just 1 page in size could point to the > >> linear mapping directly. > > I think it's just more work. Essentially > > - convert as much as possible of the kernel mappings to vmap_local, > > which Thomas Zimmermann is rolling out. That way a dma_resv_lock will > > serve as a barrier, and ofc any new vmap needs to fail or hand out a > > dummy mapping. > > Read those patches. I am not sure how this helps with protecting > against accesses to released backing pages or IO physical ranges of BO > which is already mapped during the unplug event ? By eliminating such users, and replacing them with local maps which are strictly bound in how long they can exist (and hence we can serialize against them finishing in our hotunplug code). It doesn't solve all your problems, but it's a tool to get there. -Daniel > Andrey > > > > - handle fbcon somehow. I think shutting it all down should work out. > > - worst case keep the system backing storage around for shared dma-buf > > until the other non-dynamic driver releases it. for vram we require > > dynamic importers (and maybe it wasn't such a bright idea to allow > > pinning of importer buffers, might need to revisit that). > > > > Cheers, Daniel > > > >> Christian. > >> > >>> Andrey > >>> > >>> > >>>> -Daniel > >>>> > >>>>> Christian. > >>>>> > >>>>>> I loaded the driver with vm_update_mode=3 > >>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after > >>>>>> removing the device. I guess i can test it more by allocating GTT and > >>>>>> VRAM BOs > >>>>>> and trying to read/write to them after device is removed. > >>>>>> > >>>>>> Andrey > >>>>>> > >>>>>> > >>>>>>> Regards, > >>>>>>> Christian. > >>>>>>> > >>>>>>>> Andrey > >>>>>> _______________________________________________ > >>>>>> amd-gfx mailing list > >>>>>> amd-gfx@lists.freedesktop.org > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C37b4367cbdaa4133b01d08d8a1e5bf41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437355430007196%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=r0bIJS3HUDkFPqyFinAt4eahM%2BjF01DObZ5abgstzSU%3D&reserved=0 > >>>>>> > >
On 12/16/20 6:15 PM, Daniel Vetter wrote: > On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky > <Andrey.Grodzovsky@amd.com> wrote: >> >> On 12/16/20 12:12 PM, Daniel Vetter wrote: >>> On Wed, Dec 16, 2020 at 5:18 PM Christian König >>> <christian.koenig@amd.com> wrote: >>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: >>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote: >>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König >>>>>> <ckoenig.leichtzumerken@gmail.com> wrote: >>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: >>>>>>>> [SNIP] >>>>>>>>>> While we can't control user application accesses to the mapped >>>>>>>>>> buffers explicitly and hence we use page fault rerouting >>>>>>>>>> I am thinking that in this case we may be able to sprinkle >>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might >>>>>>>>>> CPU access a DMA buffer from the kernel ? >>>>>>>>> Yes, I fear we are going to need that. >>>>>>>>> >>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW >>>>>>>>>> memcpy ? Is there other places ? >>>>>>>>> Puh, good question. I have no idea. >>>>>>>>> >>>>>>>>>> Another point is that at this point the driver shouldn't access any >>>>>>>>>> such buffers as we are at the process finishing the device. >>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I >>>>>>>>>> don't think there is anything else to do ? >>>>>>>>> Well there is a page fault handler for kernel mappings, but that one >>>>>>>>> just prints the stack trace into the system log and calls BUG(); :) >>>>>>>>> >>>>>>>>> Long story short we need to avoid any access to released pages after >>>>>>>>> unplug. No matter if it's from the kernel or userspace. >>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU >>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in >>>>>>>> the code >>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the >>>>>>>> sake of device to main memory access). Kernel page table is not >>>>>>>> touched >>>>>>>> until last bo refcount is dropped and the bo is released >>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This >>>>>>>> is both >>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped >>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we >>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it, >>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and >>>>>>>> for >>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel >>>>>>>> page table since iounmap wasn't called yet. >>>>>>> The problem is the system pages would be freed and if we kernel driver >>>>>>> still happily write to them we are pretty much busted because we write >>>>>>> to freed up memory. >>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will >>>>> release >>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since >>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes >>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't >>>>> we have end up writing to freed memory in this time interval ? Don't we >>>>> need to postpone pages freeing to after kernel page table unmapping ? >>>> BOs are only destroyed when there is a guarantee that nobody is >>>> accessing them any more. >>>> >>>> The problem here is that the pages as well as the VRAM can be >>>> immediately reused after the hotplug event. >>>> >>>>>> Similar for vram, if this is actual hotunplug and then replug, there's >>>>>> going to be a different device behind the same mmio bar range most >>>>>> likely (the higher bridges all this have the same windows assigned), >>>>> No idea how this actually works but if we haven't called iounmap yet >>>>> doesn't it mean that those physical ranges that are still mapped into >>>>> page >>>>> table should be reserved and cannot be reused for another >>>>> device ? As a guess, maybe another subrange from the higher bridge's >>>>> total >>>>> range will be allocated. >>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for >>>> a range when it is hotplugged. >>>> >>>>>> and that's bad news if we keep using it for current drivers. So we >>>>>> really have to point all these cpu ptes to some other place. >>>>> We can't just unmap it without syncing against any in kernel accesses >>>>> to those buffers >>>>> and since page faulting technique we use for user mapped buffers seems >>>>> to not be possible >>>>> for kernel mapped buffers I am not sure how to do it gracefully... >>>> We could try to replace the kmap with a dummy page under the hood, but >>>> that is extremely tricky. >>>> >>>> Especially since BOs which are just 1 page in size could point to the >>>> linear mapping directly. >>> I think it's just more work. Essentially >>> - convert as much as possible of the kernel mappings to vmap_local, >>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will >>> serve as a barrier, and ofc any new vmap needs to fail or hand out a >>> dummy mapping. >> Read those patches. I am not sure how this helps with protecting >> against accesses to released backing pages or IO physical ranges of BO >> which is already mapped during the unplug event ? > By eliminating such users, and replacing them with local maps which > are strictly bound in how long they can exist (and hence we can > serialize against them finishing in our hotunplug code). Not sure I see how serializing against BO map/unmap helps - our problem as you described is that once device is extracted and then something else quickly takes it's place in the PCI topology and gets assigned same physical IO ranges, then our driver will start accessing this new device because our 'zombie' BOs are still pointing to those ranges. Another point regarding serializing - problem is that some of those BOs are very long lived, take for example the HW command ring buffer Christian mentioned before - (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span is basically for the entire time the device exists, it's destroyed only in the SW fini stage (when last drm_dev reference is dropped) and so should I grab it's dma_resv_lock from amdgpu_pci_remove code and wait for it to be unmapped before proceeding with the PCI remove code ? This can take unbound time and that why I don't understand how serializing will help. Andrey > It doesn't > solve all your problems, but it's a tool to get there. > -Daniel > >> Andrey >> >> >>> - handle fbcon somehow. I think shutting it all down should work out. >>> - worst case keep the system backing storage around for shared dma-buf >>> until the other non-dynamic driver releases it. for vram we require >>> dynamic importers (and maybe it wasn't such a bright idea to allow >>> pinning of importer buffers, might need to revisit that). >>> >>> Cheers, Daniel >>> >>>> Christian. >>>> >>>>> Andrey >>>>> >>>>> >>>>>> -Daniel >>>>>> >>>>>>> Christian. >>>>>>> >>>>>>>> I loaded the driver with vm_update_mode=3 >>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after >>>>>>>> removing the device. I guess i can test it more by allocating GTT and >>>>>>>> VRAM BOs >>>>>>>> and trying to read/write to them after device is removed. >>>>>>>> >>>>>>>> Andrey >>>>>>>> >>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>>> Andrey >>>>>>>> _______________________________________________ >>>>>>>> amd-gfx mailing list >>>>>>>> amd-gfx@lists.freedesktop.org >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C5849827698d1428065d408d8a2188518%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437573486129589%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=uahs3RgQxLpUJ9hCzE8pvK9UWFCpyIz1i4MNKikl0tY%3D&reserved=0 >>>>>>>> > >
On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote: > > On 12/16/20 6:15 PM, Daniel Vetter wrote: > > On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky > > <Andrey.Grodzovsky@amd.com> wrote: > > > > > > On 12/16/20 12:12 PM, Daniel Vetter wrote: > > > > On Wed, Dec 16, 2020 at 5:18 PM Christian König > > > > <christian.koenig@amd.com> wrote: > > > > > Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: > > > > > > On 12/16/20 9:21 AM, Daniel Vetter wrote: > > > > > > > On Wed, Dec 16, 2020 at 9:04 AM Christian König > > > > > > > <ckoenig.leichtzumerken@gmail.com> wrote: > > > > > > > > Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: > > > > > > > > > [SNIP] > > > > > > > > > > > While we can't control user application accesses to the mapped > > > > > > > > > > > buffers explicitly and hence we use page fault rerouting > > > > > > > > > > > I am thinking that in this case we may be able to sprinkle > > > > > > > > > > > drm_dev_enter/exit in any such sensitive place were we might > > > > > > > > > > > CPU access a DMA buffer from the kernel ? > > > > > > > > > > Yes, I fear we are going to need that. > > > > > > > > > > > > > > > > > > > > > Things like CPU page table updates, ring buffer accesses and FW > > > > > > > > > > > memcpy ? Is there other places ? > > > > > > > > > > Puh, good question. I have no idea. > > > > > > > > > > > > > > > > > > > > > Another point is that at this point the driver shouldn't access any > > > > > > > > > > > such buffers as we are at the process finishing the device. > > > > > > > > > > > AFAIK there is no page fault mechanism for kernel mappings so I > > > > > > > > > > > don't think there is anything else to do ? > > > > > > > > > > Well there is a page fault handler for kernel mappings, but that one > > > > > > > > > > just prints the stack trace into the system log and calls BUG(); :) > > > > > > > > > > > > > > > > > > > > Long story short we need to avoid any access to released pages after > > > > > > > > > > unplug. No matter if it's from the kernel or userspace. > > > > > > > > > I was just about to start guarding with drm_dev_enter/exit CPU > > > > > > > > > accesses from kernel to GTT ot VRAM buffers but then i looked more in > > > > > > > > > the code > > > > > > > > > and seems like ttm_tt_unpopulate just deletes DMA mappings (for the > > > > > > > > > sake of device to main memory access). Kernel page table is not > > > > > > > > > touched > > > > > > > > > until last bo refcount is dropped and the bo is released > > > > > > > > > (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This > > > > > > > > > is both > > > > > > > > > for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped > > > > > > > > > by ioremap. So as i see it, nothing will bad will happen after we > > > > > > > > > unpopulate a BO while we still try to use a kernel mapping for it, > > > > > > > > > system memory pages backing GTT BOs are still mapped and not freed and > > > > > > > > > for > > > > > > > > > VRAM BOs same is for the IO physical ranges mapped into the kernel > > > > > > > > > page table since iounmap wasn't called yet. > > > > > > > > The problem is the system pages would be freed and if we kernel driver > > > > > > > > still happily write to them we are pretty much busted because we write > > > > > > > > to freed up memory. > > > > > > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will > > > > > > release > > > > > > the GTT BO pages. But then isn't there a problem in ttm_bo_release since > > > > > > ttm_bo_cleanup_memtype_use which also leads to pages release comes > > > > > > before bo->destroy which unmaps the pages from kernel page table ? Won't > > > > > > we have end up writing to freed memory in this time interval ? Don't we > > > > > > need to postpone pages freeing to after kernel page table unmapping ? > > > > > BOs are only destroyed when there is a guarantee that nobody is > > > > > accessing them any more. > > > > > > > > > > The problem here is that the pages as well as the VRAM can be > > > > > immediately reused after the hotplug event. > > > > > > > > > > > > Similar for vram, if this is actual hotunplug and then replug, there's > > > > > > > going to be a different device behind the same mmio bar range most > > > > > > > likely (the higher bridges all this have the same windows assigned), > > > > > > No idea how this actually works but if we haven't called iounmap yet > > > > > > doesn't it mean that those physical ranges that are still mapped into > > > > > > page > > > > > > table should be reserved and cannot be reused for another > > > > > > device ? As a guess, maybe another subrange from the higher bridge's > > > > > > total > > > > > > range will be allocated. > > > > > Nope, the PCIe subsystem doesn't care about any ioremap still active for > > > > > a range when it is hotplugged. > > > > > > > > > > > > and that's bad news if we keep using it for current drivers. So we > > > > > > > really have to point all these cpu ptes to some other place. > > > > > > We can't just unmap it without syncing against any in kernel accesses > > > > > > to those buffers > > > > > > and since page faulting technique we use for user mapped buffers seems > > > > > > to not be possible > > > > > > for kernel mapped buffers I am not sure how to do it gracefully... > > > > > We could try to replace the kmap with a dummy page under the hood, but > > > > > that is extremely tricky. > > > > > > > > > > Especially since BOs which are just 1 page in size could point to the > > > > > linear mapping directly. > > > > I think it's just more work. Essentially > > > > - convert as much as possible of the kernel mappings to vmap_local, > > > > which Thomas Zimmermann is rolling out. That way a dma_resv_lock will > > > > serve as a barrier, and ofc any new vmap needs to fail or hand out a > > > > dummy mapping. > > > Read those patches. I am not sure how this helps with protecting > > > against accesses to released backing pages or IO physical ranges of BO > > > which is already mapped during the unplug event ? > > By eliminating such users, and replacing them with local maps which > > are strictly bound in how long they can exist (and hence we can > > serialize against them finishing in our hotunplug code). > > Not sure I see how serializing against BO map/unmap helps - our problem as > you described is that once > device is extracted and then something else quickly takes it's place in the > PCI topology > and gets assigned same physical IO ranges, then our driver will start accessing this > new device because our 'zombie' BOs are still pointing to those ranges. Until your driver's remove callback is finished the ranges stay reserved. If that's not the case, then hotunplug would be fundamentally impossible ot handle correctly. Of course all the mmio actions will time out, so it might take some time to get through it all. > Another point regarding serializing - problem is that some of those BOs are > very long lived, take for example the HW command > ring buffer Christian mentioned before - > (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span > is basically for the entire time the device exists, it's destroyed only in > the SW fini stage (when last drm_dev > reference is dropped) and so should I grab it's dma_resv_lock from > amdgpu_pci_remove code and wait > for it to be unmapped before proceeding with the PCI remove code ? This can > take unbound time and that why I don't understand > how serializing will help. Uh you need to untangle that. After hw cleanup is done no one is allowed to touch that ringbuffer bo anymore from the kernel. That's what drm_dev_enter/exit guards are for. Like you say we cant wait for all sw references to disappear. The vmap_local is for mappings done by other drivers, through the dma-buf interface (where "other drivers" can include fbdev/fbcon, if you use the generic helpers). -Daniel > > Andrey > > > > It doesn't > > solve all your problems, but it's a tool to get there. > > -Daniel > > > > > Andrey > > > > > > > > > > - handle fbcon somehow. I think shutting it all down should work out. > > > > - worst case keep the system backing storage around for shared dma-buf > > > > until the other non-dynamic driver releases it. for vram we require > > > > dynamic importers (and maybe it wasn't such a bright idea to allow > > > > pinning of importer buffers, might need to revisit that). > > > > > > > > Cheers, Daniel > > > > > > > > > Christian. > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > > > -Daniel > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > > > > > I loaded the driver with vm_update_mode=3 > > > > > > > > > meaning all VM updates done using CPU and hasn't seen any OOPs after > > > > > > > > > removing the device. I guess i can test it more by allocating GTT and > > > > > > > > > VRAM BOs > > > > > > > > > and trying to read/write to them after device is removed. > > > > > > > > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Christian. > > > > > > > > > > > > > > > > > > > > > Andrey > > > > > > > > > _______________________________________________ > > > > > > > > > amd-gfx mailing list > > > > > > > > > amd-gfx@lists.freedesktop.org > > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C5849827698d1428065d408d8a2188518%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437573486129589%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=uahs3RgQxLpUJ9hCzE8pvK9UWFCpyIz1i4MNKikl0tY%3D&reserved=0 > > > > > > > > > > > > >
On 12/17/20 7:01 AM, Daniel Vetter wrote: > On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote: >> On 12/16/20 6:15 PM, Daniel Vetter wrote: >>> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky >>> <Andrey.Grodzovsky@amd.com> wrote: >>>> On 12/16/20 12:12 PM, Daniel Vetter wrote: >>>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König >>>>> <christian.koenig@amd.com> wrote: >>>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: >>>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote: >>>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König >>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote: >>>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: >>>>>>>>>> [SNIP] >>>>>>>>>>>> While we can't control user application accesses to the mapped >>>>>>>>>>>> buffers explicitly and hence we use page fault rerouting >>>>>>>>>>>> I am thinking that in this case we may be able to sprinkle >>>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might >>>>>>>>>>>> CPU access a DMA buffer from the kernel ? >>>>>>>>>>> Yes, I fear we are going to need that. >>>>>>>>>>> >>>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW >>>>>>>>>>>> memcpy ? Is there other places ? >>>>>>>>>>> Puh, good question. I have no idea. >>>>>>>>>>> >>>>>>>>>>>> Another point is that at this point the driver shouldn't access any >>>>>>>>>>>> such buffers as we are at the process finishing the device. >>>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I >>>>>>>>>>>> don't think there is anything else to do ? >>>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one >>>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :) >>>>>>>>>>> >>>>>>>>>>> Long story short we need to avoid any access to released pages after >>>>>>>>>>> unplug. No matter if it's from the kernel or userspace. >>>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU >>>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in >>>>>>>>>> the code >>>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the >>>>>>>>>> sake of device to main memory access). Kernel page table is not >>>>>>>>>> touched >>>>>>>>>> until last bo refcount is dropped and the bo is released >>>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This >>>>>>>>>> is both >>>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped >>>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we >>>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it, >>>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and >>>>>>>>>> for >>>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel >>>>>>>>>> page table since iounmap wasn't called yet. >>>>>>>>> The problem is the system pages would be freed and if we kernel driver >>>>>>>>> still happily write to them we are pretty much busted because we write >>>>>>>>> to freed up memory. >>>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will >>>>>>> release >>>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since >>>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes >>>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't >>>>>>> we have end up writing to freed memory in this time interval ? Don't we >>>>>>> need to postpone pages freeing to after kernel page table unmapping ? >>>>>> BOs are only destroyed when there is a guarantee that nobody is >>>>>> accessing them any more. >>>>>> >>>>>> The problem here is that the pages as well as the VRAM can be >>>>>> immediately reused after the hotplug event. >>>>>> >>>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's >>>>>>>> going to be a different device behind the same mmio bar range most >>>>>>>> likely (the higher bridges all this have the same windows assigned), >>>>>>> No idea how this actually works but if we haven't called iounmap yet >>>>>>> doesn't it mean that those physical ranges that are still mapped into >>>>>>> page >>>>>>> table should be reserved and cannot be reused for another >>>>>>> device ? As a guess, maybe another subrange from the higher bridge's >>>>>>> total >>>>>>> range will be allocated. >>>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for >>>>>> a range when it is hotplugged. >>>>>> >>>>>>>> and that's bad news if we keep using it for current drivers. So we >>>>>>>> really have to point all these cpu ptes to some other place. >>>>>>> We can't just unmap it without syncing against any in kernel accesses >>>>>>> to those buffers >>>>>>> and since page faulting technique we use for user mapped buffers seems >>>>>>> to not be possible >>>>>>> for kernel mapped buffers I am not sure how to do it gracefully... >>>>>> We could try to replace the kmap with a dummy page under the hood, but >>>>>> that is extremely tricky. >>>>>> >>>>>> Especially since BOs which are just 1 page in size could point to the >>>>>> linear mapping directly. >>>>> I think it's just more work. Essentially >>>>> - convert as much as possible of the kernel mappings to vmap_local, >>>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will >>>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a >>>>> dummy mapping. >>>> Read those patches. I am not sure how this helps with protecting >>>> against accesses to released backing pages or IO physical ranges of BO >>>> which is already mapped during the unplug event ? >>> By eliminating such users, and replacing them with local maps which >>> are strictly bound in how long they can exist (and hence we can >>> serialize against them finishing in our hotunplug code). >> Not sure I see how serializing against BO map/unmap helps - our problem as >> you described is that once >> device is extracted and then something else quickly takes it's place in the >> PCI topology >> and gets assigned same physical IO ranges, then our driver will start accessing this >> new device because our 'zombie' BOs are still pointing to those ranges. > Until your driver's remove callback is finished the ranges stay reserved. The ranges stay reserved until unmapped which happens in bo->destroy which for most internally allocated buffers is during sw_fini when last drm_put is called. > If that's not the case, then hotunplug would be fundamentally impossible > ot handle correctly. > > Of course all the mmio actions will time out, so it might take some time > to get through it all. I found that PCI code provides pci_device_is_present function we can use to avoid timeouts - it reads device vendor and checks if all 1s is returned or not. We can call it from within register accessors before trying read/write > >> Another point regarding serializing - problem is that some of those BOs are >> very long lived, take for example the HW command >> ring buffer Christian mentioned before - >> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span >> is basically for the entire time the device exists, it's destroyed only in >> the SW fini stage (when last drm_dev >> reference is dropped) and so should I grab it's dma_resv_lock from >> amdgpu_pci_remove code and wait >> for it to be unmapped before proceeding with the PCI remove code ? This can >> take unbound time and that why I don't understand >> how serializing will help. > Uh you need to untangle that. After hw cleanup is done no one is allowed > to touch that ringbuffer bo anymore from the kernel. I would assume we are not allowed to touch it once we identified the device is gone in order to minimize the chance of accidental writes to some other device which might now occupy those IO ranges ? > That's what > drm_dev_enter/exit guards are for. Like you say we cant wait for all sw > references to disappear. Yes, didn't make sense to me why would we use vmap_local for internally allocated buffers. I think we should also guard registers read/writes for the same reason as above. > > The vmap_local is for mappings done by other drivers, through the dma-buf > interface (where "other drivers" can include fbdev/fbcon, if you use the > generic helpers). > -Daniel Ok, so I assumed that with vmap_local you were trying to solve the problem of quick reinsertion of another device into same MMIO range that my driver still points too but actually are you trying to solve the issue of exported dma buffers outliving the device ? For this we have drm_device refcount in the GEM layer i think. Andrey > >> Andrey >> >> >>> It doesn't >>> solve all your problems, but it's a tool to get there. >>> -Daniel >>> >>>> Andrey >>>> >>>> >>>>> - handle fbcon somehow. I think shutting it all down should work out. >>>>> - worst case keep the system backing storage around for shared dma-buf >>>>> until the other non-dynamic driver releases it. for vram we require >>>>> dynamic importers (and maybe it wasn't such a bright idea to allow >>>>> pinning of importer buffers, might need to revisit that). >>>>> >>>>> Cheers, Daniel >>>>> >>>>>> Christian. >>>>>> >>>>>>> Andrey >>>>>>> >>>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>>> I loaded the driver with vm_update_mode=3 >>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after >>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and >>>>>>>>>> VRAM BOs >>>>>>>>>> and trying to read/write to them after device is removed. >>>>>>>>>> >>>>>>>>>> Andrey >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Christian. >>>>>>>>>>> >>>>>>>>>>>> Andrey >>>>>>>>>> _______________________________________________ >>>>>>>>>> amd-gfx mailing list >>>>>>>>>> amd-gfx@lists.freedesktop.org >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&reserved=0 >>>>>>>>>> >>>
[SNIP] >>> By eliminating such users, and replacing them with local maps which >>>> are strictly bound in how long they can exist (and hence we can >>>> serialize against them finishing in our hotunplug code). >>> Not sure I see how serializing against BO map/unmap helps - our >>> problem as >>> you described is that once >>> device is extracted and then something else quickly takes it's place >>> in the >>> PCI topology >>> and gets assigned same physical IO ranges, then our driver will >>> start accessing this >>> new device because our 'zombie' BOs are still pointing to those ranges. >> Until your driver's remove callback is finished the ranges stay >> reserved. > > > The ranges stay reserved until unmapped which happens in bo->destroy I'm not sure of that. Why do you think that? > which for most internally allocated buffers is during sw_fini when > last drm_put > is called. > > >> If that's not the case, then hotunplug would be fundamentally impossible >> ot handle correctly. >> >> Of course all the mmio actions will time out, so it might take some time >> to get through it all. > > > I found that PCI code provides pci_device_is_present function > we can use to avoid timeouts - it reads device vendor and checks if > all 1s is returned > or not. We can call it from within register accessors before trying > read/write That's way to much overhead! We need to keep that much lower or it will result in quite a performance drop. I suggest to rather think about adding drm_dev_enter/exit guards. Christian. > >>> Another point regarding serializing - problem is that some of those >>> BOs are >>> very long lived, take for example the HW command >>> ring buffer Christian mentioned before - >>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span >>> is basically for the entire time the device exists, it's destroyed >>> only in >>> the SW fini stage (when last drm_dev >>> reference is dropped) and so should I grab it's dma_resv_lock from >>> amdgpu_pci_remove code and wait >>> for it to be unmapped before proceeding with the PCI remove code ? >>> This can >>> take unbound time and that why I don't understand >>> how serializing will help. >> Uh you need to untangle that. After hw cleanup is done no one is allowed >> to touch that ringbuffer bo anymore from the kernel. > > > I would assume we are not allowed to touch it once we identified the > device is > gone in order to minimize the chance of accidental writes to some > other device which might now > occupy those IO ranges ? > > >> That's what >> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw >> references to disappear. > > > Yes, didn't make sense to me why would we use vmap_local for internally > allocated buffers. I think we should also guard registers read/writes > for the > same reason as above. > > >> >> The vmap_local is for mappings done by other drivers, through the >> dma-buf >> interface (where "other drivers" can include fbdev/fbcon, if you use the >> generic helpers). >> -Daniel > > > Ok, so I assumed that with vmap_local you were trying to solve the > problem of quick reinsertion > of another device into same MMIO range that my driver still points too > but actually are you trying to solve > the issue of exported dma buffers outliving the device ? For this we > have drm_device refcount in the GEM layer > i think. > > Andrey > > >> >>> Andrey >>> >>> >>>> It doesn't >>>> solve all your problems, but it's a tool to get there. >>>> -Daniel >>>> >>>>> Andrey >>>>> >>>>> >>>>>> - handle fbcon somehow. I think shutting it all down should work >>>>>> out. >>>>>> - worst case keep the system backing storage around for shared >>>>>> dma-buf >>>>>> until the other non-dynamic driver releases it. for vram we require >>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow >>>>>> pinning of importer buffers, might need to revisit that). >>>>>> >>>>>> Cheers, Daniel >>>>>> >>>>>>> Christian. >>>>>>> >>>>>>>> Andrey >>>>>>>> >>>>>>>> >>>>>>>>> -Daniel >>>>>>>>> >>>>>>>>>> Christian. >>>>>>>>>> >>>>>>>>>>> I loaded the driver with vm_update_mode=3 >>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any >>>>>>>>>>> OOPs after >>>>>>>>>>> removing the device. I guess i can test it more by >>>>>>>>>>> allocating GTT and >>>>>>>>>>> VRAM BOs >>>>>>>>>>> and trying to read/write to them after device is removed. >>>>>>>>>>> >>>>>>>>>>> Andrey >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Christian. >>>>>>>>>>>> >>>>>>>>>>>>> Andrey >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> amd-gfx mailing list >>>>>>>>>>> amd-gfx@lists.freedesktop.org >>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&reserved=0 >>>>>>>>>>> >>>>>>>>>>> >>>>
On 12/17/20 3:10 PM, Christian König wrote: > [SNIP] >>>> By eliminating such users, and replacing them with local maps which >>>>> are strictly bound in how long they can exist (and hence we can >>>>> serialize against them finishing in our hotunplug code). >>>> Not sure I see how serializing against BO map/unmap helps - our problem as >>>> you described is that once >>>> device is extracted and then something else quickly takes it's place in the >>>> PCI topology >>>> and gets assigned same physical IO ranges, then our driver will start >>>> accessing this >>>> new device because our 'zombie' BOs are still pointing to those ranges. >>> Until your driver's remove callback is finished the ranges stay reserved. >> >> >> The ranges stay reserved until unmapped which happens in bo->destroy > > I'm not sure of that. Why do you think that? Because of this sequence ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap Is there another place I am missing ? > >> which for most internally allocated buffers is during sw_fini when last drm_put >> is called. >> >> >>> If that's not the case, then hotunplug would be fundamentally impossible >>> ot handle correctly. >>> >>> Of course all the mmio actions will time out, so it might take some time >>> to get through it all. >> >> >> I found that PCI code provides pci_device_is_present function >> we can use to avoid timeouts - it reads device vendor and checks if all 1s is >> returned >> or not. We can call it from within register accessors before trying read/write > > That's way to much overhead! We need to keep that much lower or it will result > in quite a performance drop. > > I suggest to rather think about adding drm_dev_enter/exit guards. Sure, this one is just a bit upstream to the disconnect event. Eventually none of them is watertight. Andrey > > Christian. > >> >>>> Another point regarding serializing - problem is that some of those BOs are >>>> very long lived, take for example the HW command >>>> ring buffer Christian mentioned before - >>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span >>>> is basically for the entire time the device exists, it's destroyed only in >>>> the SW fini stage (when last drm_dev >>>> reference is dropped) and so should I grab it's dma_resv_lock from >>>> amdgpu_pci_remove code and wait >>>> for it to be unmapped before proceeding with the PCI remove code ? This can >>>> take unbound time and that why I don't understand >>>> how serializing will help. >>> Uh you need to untangle that. After hw cleanup is done no one is allowed >>> to touch that ringbuffer bo anymore from the kernel. >> >> >> I would assume we are not allowed to touch it once we identified the device is >> gone in order to minimize the chance of accidental writes to some other >> device which might now >> occupy those IO ranges ? >> >> >>> That's what >>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw >>> references to disappear. >> >> >> Yes, didn't make sense to me why would we use vmap_local for internally >> allocated buffers. I think we should also guard registers read/writes for the >> same reason as above. >> >> >>> >>> The vmap_local is for mappings done by other drivers, through the dma-buf >>> interface (where "other drivers" can include fbdev/fbcon, if you use the >>> generic helpers). >>> -Daniel >> >> >> Ok, so I assumed that with vmap_local you were trying to solve the problem of >> quick reinsertion >> of another device into same MMIO range that my driver still points too but >> actually are you trying to solve >> the issue of exported dma buffers outliving the device ? For this we have >> drm_device refcount in the GEM layer >> i think. >> >> Andrey >> >> >>> >>>> Andrey >>>> >>>> >>>>> It doesn't >>>>> solve all your problems, but it's a tool to get there. >>>>> -Daniel >>>>> >>>>>> Andrey >>>>>> >>>>>> >>>>>>> - handle fbcon somehow. I think shutting it all down should work out. >>>>>>> - worst case keep the system backing storage around for shared dma-buf >>>>>>> until the other non-dynamic driver releases it. for vram we require >>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow >>>>>>> pinning of importer buffers, might need to revisit that). >>>>>>> >>>>>>> Cheers, Daniel >>>>>>> >>>>>>>> Christian. >>>>>>>> >>>>>>>>> Andrey >>>>>>>>> >>>>>>>>> >>>>>>>>>> -Daniel >>>>>>>>>> >>>>>>>>>>> Christian. >>>>>>>>>>> >>>>>>>>>>>> I loaded the driver with vm_update_mode=3 >>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after >>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and >>>>>>>>>>>> VRAM BOs >>>>>>>>>>>> and trying to read/write to them after device is removed. >>>>>>>>>>>> >>>>>>>>>>>> Andrey >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Christian. >>>>>>>>>>>>> >>>>>>>>>>>>>> Andrey >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> amd-gfx mailing list >>>>>>>>>>>> amd-gfx@lists.freedesktop.org >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&reserved=0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>> >
On Thu, Dec 17, 2020 at 8:19 PM Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> wrote: > > > On 12/17/20 7:01 AM, Daniel Vetter wrote: > > On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote: > >> On 12/16/20 6:15 PM, Daniel Vetter wrote: > >>> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky > >>> <Andrey.Grodzovsky@amd.com> wrote: > >>>> On 12/16/20 12:12 PM, Daniel Vetter wrote: > >>>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König > >>>>> <christian.koenig@amd.com> wrote: > >>>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: > >>>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote: > >>>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König > >>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote: > >>>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: > >>>>>>>>>> [SNIP] > >>>>>>>>>>>> While we can't control user application accesses to the mapped > >>>>>>>>>>>> buffers explicitly and hence we use page fault rerouting > >>>>>>>>>>>> I am thinking that in this case we may be able to sprinkle > >>>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might > >>>>>>>>>>>> CPU access a DMA buffer from the kernel ? > >>>>>>>>>>> Yes, I fear we are going to need that. > >>>>>>>>>>> > >>>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW > >>>>>>>>>>>> memcpy ? Is there other places ? > >>>>>>>>>>> Puh, good question. I have no idea. > >>>>>>>>>>> > >>>>>>>>>>>> Another point is that at this point the driver shouldn't access any > >>>>>>>>>>>> such buffers as we are at the process finishing the device. > >>>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I > >>>>>>>>>>>> don't think there is anything else to do ? > >>>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one > >>>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :) > >>>>>>>>>>> > >>>>>>>>>>> Long story short we need to avoid any access to released pages after > >>>>>>>>>>> unplug. No matter if it's from the kernel or userspace. > >>>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU > >>>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in > >>>>>>>>>> the code > >>>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the > >>>>>>>>>> sake of device to main memory access). Kernel page table is not > >>>>>>>>>> touched > >>>>>>>>>> until last bo refcount is dropped and the bo is released > >>>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This > >>>>>>>>>> is both > >>>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped > >>>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we > >>>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it, > >>>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and > >>>>>>>>>> for > >>>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel > >>>>>>>>>> page table since iounmap wasn't called yet. > >>>>>>>>> The problem is the system pages would be freed and if we kernel driver > >>>>>>>>> still happily write to them we are pretty much busted because we write > >>>>>>>>> to freed up memory. > >>>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will > >>>>>>> release > >>>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since > >>>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes > >>>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't > >>>>>>> we have end up writing to freed memory in this time interval ? Don't we > >>>>>>> need to postpone pages freeing to after kernel page table unmapping ? > >>>>>> BOs are only destroyed when there is a guarantee that nobody is > >>>>>> accessing them any more. > >>>>>> > >>>>>> The problem here is that the pages as well as the VRAM can be > >>>>>> immediately reused after the hotplug event. > >>>>>> > >>>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's > >>>>>>>> going to be a different device behind the same mmio bar range most > >>>>>>>> likely (the higher bridges all this have the same windows assigned), > >>>>>>> No idea how this actually works but if we haven't called iounmap yet > >>>>>>> doesn't it mean that those physical ranges that are still mapped into > >>>>>>> page > >>>>>>> table should be reserved and cannot be reused for another > >>>>>>> device ? As a guess, maybe another subrange from the higher bridge's > >>>>>>> total > >>>>>>> range will be allocated. > >>>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for > >>>>>> a range when it is hotplugged. > >>>>>> > >>>>>>>> and that's bad news if we keep using it for current drivers. So we > >>>>>>>> really have to point all these cpu ptes to some other place. > >>>>>>> We can't just unmap it without syncing against any in kernel accesses > >>>>>>> to those buffers > >>>>>>> and since page faulting technique we use for user mapped buffers seems > >>>>>>> to not be possible > >>>>>>> for kernel mapped buffers I am not sure how to do it gracefully... > >>>>>> We could try to replace the kmap with a dummy page under the hood, but > >>>>>> that is extremely tricky. > >>>>>> > >>>>>> Especially since BOs which are just 1 page in size could point to the > >>>>>> linear mapping directly. > >>>>> I think it's just more work. Essentially > >>>>> - convert as much as possible of the kernel mappings to vmap_local, > >>>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will > >>>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a > >>>>> dummy mapping. > >>>> Read those patches. I am not sure how this helps with protecting > >>>> against accesses to released backing pages or IO physical ranges of BO > >>>> which is already mapped during the unplug event ? > >>> By eliminating such users, and replacing them with local maps which > >>> are strictly bound in how long they can exist (and hence we can > >>> serialize against them finishing in our hotunplug code). > >> Not sure I see how serializing against BO map/unmap helps - our problem as > >> you described is that once > >> device is extracted and then something else quickly takes it's place in the > >> PCI topology > >> and gets assigned same physical IO ranges, then our driver will start accessing this > >> new device because our 'zombie' BOs are still pointing to those ranges. > > Until your driver's remove callback is finished the ranges stay reserved. > > > The ranges stay reserved until unmapped which happens in bo->destroy > which for most internally allocated buffers is during sw_fini when last drm_put > is called. > > > > If that's not the case, then hotunplug would be fundamentally impossible > > ot handle correctly. > > > > Of course all the mmio actions will time out, so it might take some time > > to get through it all. > > > I found that PCI code provides pci_device_is_present function > we can use to avoid timeouts - it reads device vendor and checks if all 1s is > returned > or not. We can call it from within register accessors before trying read/write drm_dev_enter/exit is a _lot_ less overhead, plus makes a lot stronger guarantees for hotunplug ordering. Please use that one instead of hand-rolling something which only mostly works for closing hotunplug races. pciconfig access is really slow. > >> Another point regarding serializing - problem is that some of those BOs are > >> very long lived, take for example the HW command > >> ring buffer Christian mentioned before - > >> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span > >> is basically for the entire time the device exists, it's destroyed only in > >> the SW fini stage (when last drm_dev > >> reference is dropped) and so should I grab it's dma_resv_lock from > >> amdgpu_pci_remove code and wait > >> for it to be unmapped before proceeding with the PCI remove code ? This can > >> take unbound time and that why I don't understand > >> how serializing will help. > > Uh you need to untangle that. After hw cleanup is done no one is allowed > > to touch that ringbuffer bo anymore from the kernel. > > > I would assume we are not allowed to touch it once we identified the device is > gone in order to minimize the chance of accidental writes to some other device > which might now > occupy those IO ranges ? > > > > That's what > > drm_dev_enter/exit guards are for. Like you say we cant wait for all sw > > references to disappear. > > > Yes, didn't make sense to me why would we use vmap_local for internally > allocated buffers. I think we should also guard registers read/writes for the > same reason as above. > > > > > > The vmap_local is for mappings done by other drivers, through the dma-buf > > interface (where "other drivers" can include fbdev/fbcon, if you use the > > generic helpers). > > -Daniel > > > Ok, so I assumed that with vmap_local you were trying to solve the problem of > quick reinsertion > of another device into same MMIO range that my driver still points too but > actually are you trying to solve > the issue of exported dma buffers outliving the device ? For this we have > drm_device refcount in the GEM layer > i think. That's completely different lifetime problems. Don't mix them up :-) One problem is the hardware disappearing, and for that we _have_ to guarantee timeliness, or otherwise the pci subsystem gets pissed (since like you say, a new device might show up and need it's mmio bars assigned to io ranges). The other is lifetim of the software objects we use as interfaces, both from userspace and from other kernel drivers. There we fundamentally can't enforce timely cleanup, and have to resort to refcounting. We need both. -Daniel > Andrey > > > > > >> Andrey > >> > >> > >>> It doesn't > >>> solve all your problems, but it's a tool to get there. > >>> -Daniel > >>> > >>>> Andrey > >>>> > >>>> > >>>>> - handle fbcon somehow. I think shutting it all down should work out. > >>>>> - worst case keep the system backing storage around for shared dma-buf > >>>>> until the other non-dynamic driver releases it. for vram we require > >>>>> dynamic importers (and maybe it wasn't such a bright idea to allow > >>>>> pinning of importer buffers, might need to revisit that). > >>>>> > >>>>> Cheers, Daniel > >>>>> > >>>>>> Christian. > >>>>>> > >>>>>>> Andrey > >>>>>>> > >>>>>>> > >>>>>>>> -Daniel > >>>>>>>> > >>>>>>>>> Christian. > >>>>>>>>> > >>>>>>>>>> I loaded the driver with vm_update_mode=3 > >>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after > >>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and > >>>>>>>>>> VRAM BOs > >>>>>>>>>> and trying to read/write to them after device is removed. > >>>>>>>>>> > >>>>>>>>>> Andrey > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Christian. > >>>>>>>>>>> > >>>>>>>>>>>> Andrey > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> amd-gfx mailing list > >>>>>>>>>> amd-gfx@lists.freedesktop.org > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&reserved=0 > >>>>>>>>>> > >>>
On Thu, Dec 17, 2020 at 9:38 PM Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> wrote: > > > On 12/17/20 3:10 PM, Christian König wrote: > > [SNIP] > >>>> By eliminating such users, and replacing them with local maps which > >>>>> are strictly bound in how long they can exist (and hence we can > >>>>> serialize against them finishing in our hotunplug code). > >>>> Not sure I see how serializing against BO map/unmap helps - our problem as > >>>> you described is that once > >>>> device is extracted and then something else quickly takes it's place in the > >>>> PCI topology > >>>> and gets assigned same physical IO ranges, then our driver will start > >>>> accessing this > >>>> new device because our 'zombie' BOs are still pointing to those ranges. > >>> Until your driver's remove callback is finished the ranges stay reserved. > >> > >> > >> The ranges stay reserved until unmapped which happens in bo->destroy > > > > I'm not sure of that. Why do you think that? > > > Because of this sequence > ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap > Is there another place I am missing ? iounmap is just the mapping, it doesn't reserve anything in the resource tree. And I don't think we should keep resources reserved past the pci remove callback, because that would upset the pci subsystem trying to assign resources to a newly hotplugged pci device. Also from a quick check amdgpu does not reserve the pci bars it's using. Somehow most drm drivers don't do that, not exactly sure why, maybe auto-enumeration of resources just works too good and we don't need the safety net of kernel/resource.c anymore. -Daniel > > > >> which for most internally allocated buffers is during sw_fini when last drm_put > >> is called. > >> > >> > >>> If that's not the case, then hotunplug would be fundamentally impossible > >>> ot handle correctly. > >>> > >>> Of course all the mmio actions will time out, so it might take some time > >>> to get through it all. > >> > >> > >> I found that PCI code provides pci_device_is_present function > >> we can use to avoid timeouts - it reads device vendor and checks if all 1s is > >> returned > >> or not. We can call it from within register accessors before trying read/write > > > > That's way to much overhead! We need to keep that much lower or it will result > > in quite a performance drop. > > > > I suggest to rather think about adding drm_dev_enter/exit guards. > > > Sure, this one is just a bit upstream to the disconnect event. Eventually none > of them is watertight. > > Andrey > > > > > > Christian. > > > >> > >>>> Another point regarding serializing - problem is that some of those BOs are > >>>> very long lived, take for example the HW command > >>>> ring buffer Christian mentioned before - > >>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span > >>>> is basically for the entire time the device exists, it's destroyed only in > >>>> the SW fini stage (when last drm_dev > >>>> reference is dropped) and so should I grab it's dma_resv_lock from > >>>> amdgpu_pci_remove code and wait > >>>> for it to be unmapped before proceeding with the PCI remove code ? This can > >>>> take unbound time and that why I don't understand > >>>> how serializing will help. > >>> Uh you need to untangle that. After hw cleanup is done no one is allowed > >>> to touch that ringbuffer bo anymore from the kernel. > >> > >> > >> I would assume we are not allowed to touch it once we identified the device is > >> gone in order to minimize the chance of accidental writes to some other > >> device which might now > >> occupy those IO ranges ? > >> > >> > >>> That's what > >>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw > >>> references to disappear. > >> > >> > >> Yes, didn't make sense to me why would we use vmap_local for internally > >> allocated buffers. I think we should also guard registers read/writes for the > >> same reason as above. > >> > >> > >>> > >>> The vmap_local is for mappings done by other drivers, through the dma-buf > >>> interface (where "other drivers" can include fbdev/fbcon, if you use the > >>> generic helpers). > >>> -Daniel > >> > >> > >> Ok, so I assumed that with vmap_local you were trying to solve the problem of > >> quick reinsertion > >> of another device into same MMIO range that my driver still points too but > >> actually are you trying to solve > >> the issue of exported dma buffers outliving the device ? For this we have > >> drm_device refcount in the GEM layer > >> i think. > >> > >> Andrey > >> > >> > >>> > >>>> Andrey > >>>> > >>>> > >>>>> It doesn't > >>>>> solve all your problems, but it's a tool to get there. > >>>>> -Daniel > >>>>> > >>>>>> Andrey > >>>>>> > >>>>>> > >>>>>>> - handle fbcon somehow. I think shutting it all down should work out. > >>>>>>> - worst case keep the system backing storage around for shared dma-buf > >>>>>>> until the other non-dynamic driver releases it. for vram we require > >>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow > >>>>>>> pinning of importer buffers, might need to revisit that). > >>>>>>> > >>>>>>> Cheers, Daniel > >>>>>>> > >>>>>>>> Christian. > >>>>>>>> > >>>>>>>>> Andrey > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> -Daniel > >>>>>>>>>> > >>>>>>>>>>> Christian. > >>>>>>>>>>> > >>>>>>>>>>>> I loaded the driver with vm_update_mode=3 > >>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after > >>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and > >>>>>>>>>>>> VRAM BOs > >>>>>>>>>>>> and trying to read/write to them after device is removed. > >>>>>>>>>>>> > >>>>>>>>>>>> Andrey > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> Regards, > >>>>>>>>>>>>> Christian. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Andrey > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> amd-gfx mailing list > >>>>>>>>>>>> amd-gfx@lists.freedesktop.org > >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&reserved=0 > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>> > >
On 12/17/20 3:48 PM, Daniel Vetter wrote: > On Thu, Dec 17, 2020 at 9:38 PM Andrey Grodzovsky > <Andrey.Grodzovsky@amd.com> wrote: >> >> On 12/17/20 3:10 PM, Christian König wrote: >>> [SNIP] >>>>>> By eliminating such users, and replacing them with local maps which >>>>>>> are strictly bound in how long they can exist (and hence we can >>>>>>> serialize against them finishing in our hotunplug code). >>>>>> Not sure I see how serializing against BO map/unmap helps - our problem as >>>>>> you described is that once >>>>>> device is extracted and then something else quickly takes it's place in the >>>>>> PCI topology >>>>>> and gets assigned same physical IO ranges, then our driver will start >>>>>> accessing this >>>>>> new device because our 'zombie' BOs are still pointing to those ranges. >>>>> Until your driver's remove callback is finished the ranges stay reserved. >>>> >>>> The ranges stay reserved until unmapped which happens in bo->destroy >>> I'm not sure of that. Why do you think that? >> >> Because of this sequence >> ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap >> Is there another place I am missing ? > iounmap is just the mapping, it doesn't reserve anything in the resource tree. > > And I don't think we should keep resources reserved past the pci > remove callback, because that would upset the pci subsystem trying to > assign resources to a newly hotplugged pci device. I assumed we are talking about VA ranges still mapped in the page table. I just assumed that part of ioremap is also reservation of the mapped physical ranges. In fact, if we do can explicitly reserve those ranges (as you mention here) then together with postponing system memory pages freeing/releasing back to the page pool until after BO is unmapped from the kernel address space I believe this could solve the issue of quick HW reinsertion and make all the drm_dev_ener/exit guarding obsolete. Andrey > Also from a quick check amdgpu does not reserve the pci bars it's > using. Somehow most drm drivers don't do that, not exactly sure why, > maybe auto-enumeration of resources just works too good and we don't > need the safety net of kernel/resource.c anymore. > -Daniel > > >>>> which for most internally allocated buffers is during sw_fini when last drm_put >>>> is called. >>>> >>>> >>>>> If that's not the case, then hotunplug would be fundamentally impossible >>>>> ot handle correctly. >>>>> >>>>> Of course all the mmio actions will time out, so it might take some time >>>>> to get through it all. >>>> >>>> I found that PCI code provides pci_device_is_present function >>>> we can use to avoid timeouts - it reads device vendor and checks if all 1s is >>>> returned >>>> or not. We can call it from within register accessors before trying read/write >>> That's way to much overhead! We need to keep that much lower or it will result >>> in quite a performance drop. >>> >>> I suggest to rather think about adding drm_dev_enter/exit guards. >> >> Sure, this one is just a bit upstream to the disconnect event. Eventually none >> of them is watertight. >> >> Andrey >> >> >>> Christian. >>> >>>>>> Another point regarding serializing - problem is that some of those BOs are >>>>>> very long lived, take for example the HW command >>>>>> ring buffer Christian mentioned before - >>>>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span >>>>>> is basically for the entire time the device exists, it's destroyed only in >>>>>> the SW fini stage (when last drm_dev >>>>>> reference is dropped) and so should I grab it's dma_resv_lock from >>>>>> amdgpu_pci_remove code and wait >>>>>> for it to be unmapped before proceeding with the PCI remove code ? This can >>>>>> take unbound time and that why I don't understand >>>>>> how serializing will help. >>>>> Uh you need to untangle that. After hw cleanup is done no one is allowed >>>>> to touch that ringbuffer bo anymore from the kernel. >>>> >>>> I would assume we are not allowed to touch it once we identified the device is >>>> gone in order to minimize the chance of accidental writes to some other >>>> device which might now >>>> occupy those IO ranges ? >>>> >>>> >>>>> That's what >>>>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw >>>>> references to disappear. >>>> >>>> Yes, didn't make sense to me why would we use vmap_local for internally >>>> allocated buffers. I think we should also guard registers read/writes for the >>>> same reason as above. >>>> >>>> >>>>> The vmap_local is for mappings done by other drivers, through the dma-buf >>>>> interface (where "other drivers" can include fbdev/fbcon, if you use the >>>>> generic helpers). >>>>> -Daniel >>>> >>>> Ok, so I assumed that with vmap_local you were trying to solve the problem of >>>> quick reinsertion >>>> of another device into same MMIO range that my driver still points too but >>>> actually are you trying to solve >>>> the issue of exported dma buffers outliving the device ? For this we have >>>> drm_device refcount in the GEM layer >>>> i think. >>>> >>>> Andrey >>>> >>>> >>>>>> Andrey >>>>>> >>>>>> >>>>>>> It doesn't >>>>>>> solve all your problems, but it's a tool to get there. >>>>>>> -Daniel >>>>>>> >>>>>>>> Andrey >>>>>>>> >>>>>>>> >>>>>>>>> - handle fbcon somehow. I think shutting it all down should work out. >>>>>>>>> - worst case keep the system backing storage around for shared dma-buf >>>>>>>>> until the other non-dynamic driver releases it. for vram we require >>>>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow >>>>>>>>> pinning of importer buffers, might need to revisit that). >>>>>>>>> >>>>>>>>> Cheers, Daniel >>>>>>>>> >>>>>>>>>> Christian. >>>>>>>>>> >>>>>>>>>>> Andrey >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -Daniel >>>>>>>>>>>> >>>>>>>>>>>>> Christian. >>>>>>>>>>>>> >>>>>>>>>>>>>> I loaded the driver with vm_update_mode=3 >>>>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after >>>>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and >>>>>>>>>>>>>> VRAM BOs >>>>>>>>>>>>>> and trying to read/write to them after device is removed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Andrey >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Christian. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Andrey >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> amd-gfx mailing list >>>>>>>>>>>>>> amd-gfx@lists.freedesktop.org >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cc632e5bd5a1f402ac40608d8a2cd2072%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438349203619335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tKk0GTmSnkLVV42HuQaPAj01qFiwDW6Zs%2Bgi2hoq%2BvA%3D&reserved=0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> > >
On 12/17/20 3:42 PM, Daniel Vetter wrote: > On Thu, Dec 17, 2020 at 8:19 PM Andrey Grodzovsky > <Andrey.Grodzovsky@amd.com> wrote: >> >> On 12/17/20 7:01 AM, Daniel Vetter wrote: >>> On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote: >>>> On 12/16/20 6:15 PM, Daniel Vetter wrote: >>>>> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky >>>>> <Andrey.Grodzovsky@amd.com> wrote: >>>>>> On 12/16/20 12:12 PM, Daniel Vetter wrote: >>>>>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König >>>>>>> <christian.koenig@amd.com> wrote: >>>>>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky: >>>>>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote: >>>>>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König >>>>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote: >>>>>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky: >>>>>>>>>>>> [SNIP] >>>>>>>>>>>>>> While we can't control user application accesses to the mapped >>>>>>>>>>>>>> buffers explicitly and hence we use page fault rerouting >>>>>>>>>>>>>> I am thinking that in this case we may be able to sprinkle >>>>>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might >>>>>>>>>>>>>> CPU access a DMA buffer from the kernel ? >>>>>>>>>>>>> Yes, I fear we are going to need that. >>>>>>>>>>>>> >>>>>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW >>>>>>>>>>>>>> memcpy ? Is there other places ? >>>>>>>>>>>>> Puh, good question. I have no idea. >>>>>>>>>>>>> >>>>>>>>>>>>>> Another point is that at this point the driver shouldn't access any >>>>>>>>>>>>>> such buffers as we are at the process finishing the device. >>>>>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I >>>>>>>>>>>>>> don't think there is anything else to do ? >>>>>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one >>>>>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :) >>>>>>>>>>>>> >>>>>>>>>>>>> Long story short we need to avoid any access to released pages after >>>>>>>>>>>>> unplug. No matter if it's from the kernel or userspace. >>>>>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU >>>>>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in >>>>>>>>>>>> the code >>>>>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the >>>>>>>>>>>> sake of device to main memory access). Kernel page table is not >>>>>>>>>>>> touched >>>>>>>>>>>> until last bo refcount is dropped and the bo is released >>>>>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This >>>>>>>>>>>> is both >>>>>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped >>>>>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we >>>>>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it, >>>>>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and >>>>>>>>>>>> for >>>>>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel >>>>>>>>>>>> page table since iounmap wasn't called yet. >>>>>>>>>>> The problem is the system pages would be freed and if we kernel driver >>>>>>>>>>> still happily write to them we are pretty much busted because we write >>>>>>>>>>> to freed up memory. >>>>>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will >>>>>>>>> release >>>>>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since >>>>>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes >>>>>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't >>>>>>>>> we have end up writing to freed memory in this time interval ? Don't we >>>>>>>>> need to postpone pages freeing to after kernel page table unmapping ? >>>>>>>> BOs are only destroyed when there is a guarantee that nobody is >>>>>>>> accessing them any more. >>>>>>>> >>>>>>>> The problem here is that the pages as well as the VRAM can be >>>>>>>> immediately reused after the hotplug event. >>>>>>>> >>>>>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's >>>>>>>>>> going to be a different device behind the same mmio bar range most >>>>>>>>>> likely (the higher bridges all this have the same windows assigned), >>>>>>>>> No idea how this actually works but if we haven't called iounmap yet >>>>>>>>> doesn't it mean that those physical ranges that are still mapped into >>>>>>>>> page >>>>>>>>> table should be reserved and cannot be reused for another >>>>>>>>> device ? As a guess, maybe another subrange from the higher bridge's >>>>>>>>> total >>>>>>>>> range will be allocated. >>>>>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for >>>>>>>> a range when it is hotplugged. >>>>>>>> >>>>>>>>>> and that's bad news if we keep using it for current drivers. So we >>>>>>>>>> really have to point all these cpu ptes to some other place. >>>>>>>>> We can't just unmap it without syncing against any in kernel accesses >>>>>>>>> to those buffers >>>>>>>>> and since page faulting technique we use for user mapped buffers seems >>>>>>>>> to not be possible >>>>>>>>> for kernel mapped buffers I am not sure how to do it gracefully... >>>>>>>> We could try to replace the kmap with a dummy page under the hood, but >>>>>>>> that is extremely tricky. >>>>>>>> >>>>>>>> Especially since BOs which are just 1 page in size could point to the >>>>>>>> linear mapping directly. >>>>>>> I think it's just more work. Essentially >>>>>>> - convert as much as possible of the kernel mappings to vmap_local, >>>>>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will >>>>>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a >>>>>>> dummy mapping. >>>>>> Read those patches. I am not sure how this helps with protecting >>>>>> against accesses to released backing pages or IO physical ranges of BO >>>>>> which is already mapped during the unplug event ? >>>>> By eliminating such users, and replacing them with local maps which >>>>> are strictly bound in how long they can exist (and hence we can >>>>> serialize against them finishing in our hotunplug code). >>>> Not sure I see how serializing against BO map/unmap helps - our problem as >>>> you described is that once >>>> device is extracted and then something else quickly takes it's place in the >>>> PCI topology >>>> and gets assigned same physical IO ranges, then our driver will start accessing this >>>> new device because our 'zombie' BOs are still pointing to those ranges. >>> Until your driver's remove callback is finished the ranges stay reserved. >> >> The ranges stay reserved until unmapped which happens in bo->destroy >> which for most internally allocated buffers is during sw_fini when last drm_put >> is called. >> >> >>> If that's not the case, then hotunplug would be fundamentally impossible >>> ot handle correctly. >>> >>> Of course all the mmio actions will time out, so it might take some time >>> to get through it all. >> >> I found that PCI code provides pci_device_is_present function >> we can use to avoid timeouts - it reads device vendor and checks if all 1s is >> returned >> or not. We can call it from within register accessors before trying read/write > drm_dev_enter/exit is a _lot_ less overhead, plus makes a lot stronger > guarantees for hotunplug ordering. Please use that one instead of > hand-rolling something which only mostly works for closing hotunplug > races. pciconfig access is really slow. > >>>> Another point regarding serializing - problem is that some of those BOs are >>>> very long lived, take for example the HW command >>>> ring buffer Christian mentioned before - >>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span >>>> is basically for the entire time the device exists, it's destroyed only in >>>> the SW fini stage (when last drm_dev >>>> reference is dropped) and so should I grab it's dma_resv_lock from >>>> amdgpu_pci_remove code and wait >>>> for it to be unmapped before proceeding with the PCI remove code ? This can >>>> take unbound time and that why I don't understand >>>> how serializing will help. >>> Uh you need to untangle that. After hw cleanup is done no one is allowed >>> to touch that ringbuffer bo anymore from the kernel. >> >> I would assume we are not allowed to touch it once we identified the device is >> gone in order to minimize the chance of accidental writes to some other device >> which might now >> occupy those IO ranges ? >> >> >>> That's what >>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw >>> references to disappear. >> >> Yes, didn't make sense to me why would we use vmap_local for internally >> allocated buffers. I think we should also guard registers read/writes for the >> same reason as above. >> >> >>> The vmap_local is for mappings done by other drivers, through the dma-buf >>> interface (where "other drivers" can include fbdev/fbcon, if you use the >>> generic helpers). >>> -Daniel >> >> Ok, so I assumed that with vmap_local you were trying to solve the problem of >> quick reinsertion >> of another device into same MMIO range that my driver still points too but >> actually are you trying to solve >> the issue of exported dma buffers outliving the device ? For this we have >> drm_device refcount in the GEM layer >> i think. > That's completely different lifetime problems. Don't mix them up :-) > One problem is the hardware disappearing, and for that we _have_ to > guarantee timeliness, or otherwise the pci subsystem gets pissed > (since like you say, a new device might show up and need it's mmio > bars assigned to io ranges). The other is lifetim of the software > objects we use as interfaces, both from userspace and from other > kernel drivers. There we fundamentally can't enforce timely cleanup, > and have to resort to refcounting. So regarding the second issue, as I mentioned above, don't we already use drm_dev_get/put for exported BOs ? Earlier in this discussion you mentioned that we are ok for dma buffers since we already have the refcounting at the GEM layer and the real life cycle problem we have is the dma_fences for which there is no drm_dev refcounting. Seems to me then that vmap_local is superfluous because of the recounting we already have for exported dma_bufs and for dma_fences it won't help. Andrey > > We need both. > -Daniel > >> Andrey >> >> >>>> Andrey >>>> >>>> >>>>> It doesn't >>>>> solve all your problems, but it's a tool to get there. >>>>> -Daniel >>>>> >>>>>> Andrey >>>>>> >>>>>> >>>>>>> - handle fbcon somehow. I think shutting it all down should work out. >>>>>>> - worst case keep the system backing storage around for shared dma-buf >>>>>>> until the other non-dynamic driver releases it. for vram we require >>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow >>>>>>> pinning of importer buffers, might need to revisit that). >>>>>>> >>>>>>> Cheers, Daniel >>>>>>> >>>>>>>> Christian. >>>>>>>> >>>>>>>>> Andrey >>>>>>>>> >>>>>>>>> >>>>>>>>>> -Daniel >>>>>>>>>> >>>>>>>>>>> Christian. >>>>>>>>>>> >>>>>>>>>>>> I loaded the driver with vm_update_mode=3 >>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after >>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and >>>>>>>>>>>> VRAM BOs >>>>>>>>>>>> and trying to read/write to them after device is removed. >>>>>>>>>>>> >>>>>>>>>>>> Andrey >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Christian. >>>>>>>>>>>>> >>>>>>>>>>>>>> Andrey >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> amd-gfx mailing list >>>>>>>>>>>> amd-gfx@lists.freedesktop.org >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ca9b7d1c15b404ba8f71508d8a2cc465d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438345528118947%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QZAPRM%2BEc5b6uRCe9ws5AzLpL7VdyzACZG%2Blp8rO738%3D&reserved=0 >>>>>>>>>>>> > >
On Thu, Dec 17, 2020 at 04:06:38PM -0500, Andrey Grodzovsky wrote: > > On 12/17/20 3:48 PM, Daniel Vetter wrote: > > On Thu, Dec 17, 2020 at 9:38 PM Andrey Grodzovsky > > <Andrey.Grodzovsky@amd.com> wrote: > > > > > > On 12/17/20 3:10 PM, Christian König wrote: > > > > [SNIP] > > > > > > > By eliminating such users, and replacing them with local maps which > > > > > > > > are strictly bound in how long they can exist (and hence we can > > > > > > > > serialize against them finishing in our hotunplug code). > > > > > > > Not sure I see how serializing against BO map/unmap helps - our problem as > > > > > > > you described is that once > > > > > > > device is extracted and then something else quickly takes it's place in the > > > > > > > PCI topology > > > > > > > and gets assigned same physical IO ranges, then our driver will start > > > > > > > accessing this > > > > > > > new device because our 'zombie' BOs are still pointing to those ranges. > > > > > > Until your driver's remove callback is finished the ranges stay reserved. > > > > > > > > > > The ranges stay reserved until unmapped which happens in bo->destroy > > > > I'm not sure of that. Why do you think that? > > > > > > Because of this sequence > > > ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap > > > Is there another place I am missing ? > > iounmap is just the mapping, it doesn't reserve anything in the resource tree. > > > > And I don't think we should keep resources reserved past the pci > > remove callback, because that would upset the pci subsystem trying to > > assign resources to a newly hotplugged pci device. > > > I assumed we are talking about VA ranges still mapped in the page table. I > just assumed > that part of ioremap is also reservation of the mapped physical ranges. In > fact, if we > do can explicitly reserve those ranges (as you mention here) then together > with postponing > system memory pages freeing/releasing back to the page pool until after BO > is unmapped > from the kernel address space I believe this could solve the issue of quick > HW reinsertion > and make all the drm_dev_ener/exit guarding obsolete. We can't reserve these ranges, that's what I tried to explaine: - kernel/resource.c isn't very consistently used - the pci core will get pissed if there's suddenly a range in the middle of a bridge that it can't use - nesting is allowed for resources, so this doesn't actually garuantee much I just wanted to point out that ioremap does do any reserving, so not enough by far. We really have to stop using any mmio ranges before the pci remove callback is finished. -Daniel > > Andrey > > > > Also from a quick check amdgpu does not reserve the pci bars it's > > using. Somehow most drm drivers don't do that, not exactly sure why, > > maybe auto-enumeration of resources just works too good and we don't > > need the safety net of kernel/resource.c anymore. > > -Daniel > > > > > > > > > which for most internally allocated buffers is during sw_fini when last drm_put > > > > > is called. > > > > > > > > > > > > > > > > If that's not the case, then hotunplug would be fundamentally impossible > > > > > > ot handle correctly. > > > > > > > > > > > > Of course all the mmio actions will time out, so it might take some time > > > > > > to get through it all. > > > > > > > > > > I found that PCI code provides pci_device_is_present function > > > > > we can use to avoid timeouts - it reads device vendor and checks if all 1s is > > > > > returned > > > > > or not. We can call it from within register accessors before trying read/write > > > > That's way to much overhead! We need to keep that much lower or it will result > > > > in quite a performance drop. > > > > > > > > I suggest to rather think about adding drm_dev_enter/exit guards. > > > > > > Sure, this one is just a bit upstream to the disconnect event. Eventually none > > > of them is watertight. > > > > > > Andrey > > > > > > > > > > Christian. > > > > > > > > > > > Another point regarding serializing - problem is that some of those BOs are > > > > > > > very long lived, take for example the HW command > > > > > > > ring buffer Christian mentioned before - > > > > > > > (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span > > > > > > > is basically for the entire time the device exists, it's destroyed only in > > > > > > > the SW fini stage (when last drm_dev > > > > > > > reference is dropped) and so should I grab it's dma_resv_lock from > > > > > > > amdgpu_pci_remove code and wait > > > > > > > for it to be unmapped before proceeding with the PCI remove code ? This can > > > > > > > take unbound time and that why I don't understand > > > > > > > how serializing will help. > > > > > > Uh you need to untangle that. After hw cleanup is done no one is allowed > > > > > > to touch that ringbuffer bo anymore from the kernel. > > > > > > > > > > I would assume we are not allowed to touch it once we identified the device is > > > > > gone in order to minimize the chance of accidental writes to some other > > > > > device which might now > > > > > occupy those IO ranges ? > > > > > > > > > > > > > > > > That's what > > > > > > drm_dev_enter/exit guards are for. Like you say we cant wait for all sw > > > > > > references to disappear. > > > > > > > > > > Yes, didn't make sense to me why would we use vmap_local for internally > > > > > allocated buffers. I think we should also guard registers read/writes for the > > > > > same reason as above. > > > > > > > > > > > > > > > > The vmap_local is for mappings done by other drivers, through the dma-buf > > > > > > interface (where "other drivers" can include fbdev/fbcon, if you use the > > > > > > generic helpers). > > > > > > -Daniel > > > > > > > > > > Ok, so I assumed that with vmap_local you were trying to solve the problem of > > > > > quick reinsertion > > > > > of another device into same MMIO range that my driver still points too but > > > > > actually are you trying to solve > > > > > the issue of exported dma buffers outliving the device ? For this we have > > > > > drm_device refcount in the GEM layer > > > > > i think. > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > > > > > > It doesn't > > > > > > > > solve all your problems, but it's a tool to get there. > > > > > > > > -Daniel > > > > > > > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > > > > > > > > > > > > - handle fbcon somehow. I think shutting it all down should work out. > > > > > > > > > > - worst case keep the system backing storage around for shared dma-buf > > > > > > > > > > until the other non-dynamic driver releases it. for vram we require > > > > > > > > > > dynamic importers (and maybe it wasn't such a bright idea to allow > > > > > > > > > > pinning of importer buffers, might need to revisit that). > > > > > > > > > > > > > > > > > > > > Cheers, Daniel > > > > > > > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Daniel > > > > > > > > > > > > > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I loaded the driver with vm_update_mode=3 > > > > > > > > > > > > > > > meaning all VM updates done using CPU and hasn't seen any OOPs after > > > > > > > > > > > > > > > removing the device. I guess i can test it more by allocating GTT and > > > > > > > > > > > > > > > VRAM BOs > > > > > > > > > > > > > > > and trying to read/write to them after device is removed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > > amd-gfx mailing list > > > > > > > > > > > > > > > amd-gfx@lists.freedesktop.org > > > > > > > > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cc632e5bd5a1f402ac40608d8a2cd2072%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438349203619335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tKk0GTmSnkLVV42HuQaPAj01qFiwDW6Zs%2Bgi2hoq%2BvA%3D&reserved=0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Hey Daniel, back from vacation and going over our last long thread i think you didn't reply to my last question bellow (Or at least I can't find it). Andrey On 12/17/20 4:13 PM, Andrey Grodzovsky wrote: >>> Ok, so I assumed that with vmap_local you were trying to solve the problem of >>> quick reinsertion >>> of another device into same MMIO range that my driver still points too but >>> actually are you trying to solve >>> the issue of exported dma buffers outliving the device ? For this we have >>> drm_device refcount in the GEM layer >>> i think. >> That's completely different lifetime problems. Don't mix them up :-) >> One problem is the hardware disappearing, and for that we _have_ to >> guarantee timeliness, or otherwise the pci subsystem gets pissed >> (since like you say, a new device might show up and need it's mmio >> bars assigned to io ranges). The other is lifetim of the software >> objects we use as interfaces, both from userspace and from other >> kernel drivers. There we fundamentally can't enforce timely cleanup, >> and have to resort to refcounting. > > > So regarding the second issue, as I mentioned above, don't we already use > drm_dev_get/put > for exported BOs ? Earlier in this discussion you mentioned that we are ok for > dma buffers since > we already have the refcounting at the GEM layer and the real life cycle > problem we have is the dma_fences > for which there is no drm_dev refcounting. Seems to me then that vmap_local is > superfluous because > of the recounting we already have for exported dma_bufs and for dma_fences it > won't help. > > Andrey
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 1ccf1ef..29248a5 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm) else ttm_pool_unpopulate(ttm); } +EXPORT_SYMBOL(ttm_tt_unpopulate);
It's needed to drop iommu backed pages on device unplug before device's IOMMU group is released. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/ttm/ttm_tt.c | 1 + 1 file changed, 1 insertion(+)