diff mbox series

[1/1] drm/amdkfd: fix build error with AMD_IOMMU_V2=m

Message ID 20210309032356.20800-1-Felix.Kuehling@amd.com (mailing list archive)
State New, archived
Headers show
Series [1/1] drm/amdkfd: fix build error with AMD_IOMMU_V2=m | expand

Commit Message

Felix Kuehling March 9, 2021, 3:23 a.m. UTC
Using 'imply AMD_IOMMU_V2' does not guarantee that the driver can link
against the exported functions. If the GPU driver is built-in but the
IOMMU driver is a loadable module, the kfd_iommu.c file is indeed
built but does not work:

x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_bind_process_to_device':
kfd_iommu.c:(.text+0x516): undefined reference to `amd_iommu_bind_pasid'
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_unbind_process':
kfd_iommu.c:(.text+0x691): undefined reference to `amd_iommu_unbind_pasid'
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_suspend':
kfd_iommu.c:(.text+0x966): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0x97f): undefined reference to `amd_iommu_set_invalid_ppr_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0x9a4): undefined reference to `amd_iommu_free_device'
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_resume':
kfd_iommu.c:(.text+0xa9a): undefined reference to `amd_iommu_init_device'
x86_64-linux-ld: kfd_iommu.c:(.text+0xadc): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xaff): undefined reference to `amd_iommu_set_invalid_ppr_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xc72): undefined reference to `amd_iommu_bind_pasid'
x86_64-linux-ld: kfd_iommu.c:(.text+0xe08): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xe26): undefined reference to `amd_iommu_set_invalid_ppr_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xe42): undefined reference to `amd_iommu_free_device'

Use IS_REACHABLE to only build IOMMU-V2 support if the amd_iommu symbols
are reachable by the amdkfd driver. Output a warning if they are not,
because that may not be what the user was expecting.

Fixes: 64d1c3a43a6f ("drm/amdkfd: Centralize IOMMUv2 code and make it conditional")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c | 10 ++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h |  6 ++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

Comments

Arnd Bergmann March 9, 2021, 8:58 a.m. UTC | #1
On Tue, Mar 9, 2021 at 4:23 AM Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>
> Using 'imply AMD_IOMMU_V2' does not guarantee that the driver can link
> against the exported functions. If the GPU driver is built-in but the
> IOMMU driver is a loadable module, the kfd_iommu.c file is indeed
> built but does not work:
>
> x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_bind_process_to_device':
> kfd_iommu.c:(.text+0x516): undefined reference to `amd_iommu_bind_pasid'
> x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_unbind_process':
> kfd_iommu.c:(.text+0x691): undefined reference to `amd_iommu_unbind_pasid'
> x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_suspend':
> kfd_iommu.c:(.text+0x966): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
> x86_64-linux-ld: kfd_iommu.c:(.text+0x97f): undefined reference to `amd_iommu_set_invalid_ppr_cb'
> x86_64-linux-ld: kfd_iommu.c:(.text+0x9a4): undefined reference to `amd_iommu_free_device'
> x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_resume':
> kfd_iommu.c:(.text+0xa9a): undefined reference to `amd_iommu_init_device'
> x86_64-linux-ld: kfd_iommu.c:(.text+0xadc): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
> x86_64-linux-ld: kfd_iommu.c:(.text+0xaff): undefined reference to `amd_iommu_set_invalid_ppr_cb'
> x86_64-linux-ld: kfd_iommu.c:(.text+0xc72): undefined reference to `amd_iommu_bind_pasid'
> x86_64-linux-ld: kfd_iommu.c:(.text+0xe08): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
> x86_64-linux-ld: kfd_iommu.c:(.text+0xe26): undefined reference to `amd_iommu_set_invalid_ppr_cb'
> x86_64-linux-ld: kfd_iommu.c:(.text+0xe42): undefined reference to `amd_iommu_free_device'
>
> Use IS_REACHABLE to only build IOMMU-V2 support if the amd_iommu symbols
> are reachable by the amdkfd driver. Output a warning if they are not,
> because that may not be what the user was expecting.

This would fix the compile-time failure, but it still fails my CI
because you introduce
a compile-time warning. Can you change it into a runtime warning instead?

Neither type of warning is likely to actually reach the person trying
to debug the
problem, so you still end up with machines that don't do what they should,
but I can live with the runtime warning as long as the build doesn't break.

I think the proper fix would be to not rely on custom hooks into a particular
IOMMU driver, but to instead ensure that the amdgpu driver can do everything
it needs through the regular linux/iommu.h interfaces. I realize this
is more work,
but I wonder if you've tried that, and why it didn't work out.

       Arnd
Felix Kuehling March 9, 2021, 4:30 p.m. UTC | #2
Am 2021-03-09 um 3:58 a.m. schrieb Arnd Bergmann:
> On Tue, Mar 9, 2021 at 4:23 AM Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>> Using 'imply AMD_IOMMU_V2' does not guarantee that the driver can link
>> against the exported functions. If the GPU driver is built-in but the
>> IOMMU driver is a loadable module, the kfd_iommu.c file is indeed
>> built but does not work:
>>
>> x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_bind_process_to_device':
>> kfd_iommu.c:(.text+0x516): undefined reference to `amd_iommu_bind_pasid'
>> x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_unbind_process':
>> kfd_iommu.c:(.text+0x691): undefined reference to `amd_iommu_unbind_pasid'
>> x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_suspend':
>> kfd_iommu.c:(.text+0x966): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
>> x86_64-linux-ld: kfd_iommu.c:(.text+0x97f): undefined reference to `amd_iommu_set_invalid_ppr_cb'
>> x86_64-linux-ld: kfd_iommu.c:(.text+0x9a4): undefined reference to `amd_iommu_free_device'
>> x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_resume':
>> kfd_iommu.c:(.text+0xa9a): undefined reference to `amd_iommu_init_device'
>> x86_64-linux-ld: kfd_iommu.c:(.text+0xadc): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
>> x86_64-linux-ld: kfd_iommu.c:(.text+0xaff): undefined reference to `amd_iommu_set_invalid_ppr_cb'
>> x86_64-linux-ld: kfd_iommu.c:(.text+0xc72): undefined reference to `amd_iommu_bind_pasid'
>> x86_64-linux-ld: kfd_iommu.c:(.text+0xe08): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
>> x86_64-linux-ld: kfd_iommu.c:(.text+0xe26): undefined reference to `amd_iommu_set_invalid_ppr_cb'
>> x86_64-linux-ld: kfd_iommu.c:(.text+0xe42): undefined reference to `amd_iommu_free_device'
>>
>> Use IS_REACHABLE to only build IOMMU-V2 support if the amd_iommu symbols
>> are reachable by the amdkfd driver. Output a warning if they are not,
>> because that may not be what the user was expecting.
> This would fix the compile-time failure, but it still fails my CI
> because you introduce
> a compile-time warning. Can you change it into a runtime warning instead?

Sure.


>
> Neither type of warning is likely to actually reach the person trying
> to debug the
> problem, so you still end up with machines that don't do what they should,
> but I can live with the runtime warning as long as the build doesn't break.

OK.


>
> I think the proper fix would be to not rely on custom hooks into a particular
> IOMMU driver, but to instead ensure that the amdgpu driver can do everything
> it needs through the regular linux/iommu.h interfaces. I realize this
> is more work,
> but I wonder if you've tried that, and why it didn't work out.

As far as I know this hasn't been tried. I see that intel-iommu has its
own SVM thing, which seems to be similar to what our IOMMUv2 does. I
guess we'd have to abstract that into a common API.

Regards,
  Felix


>
>        Arnd
Jean-Philippe Brucker March 9, 2021, 5:33 p.m. UTC | #3
Hi Felix,

On Tue, Mar 09, 2021 at 11:30:19AM -0500, Felix Kuehling wrote:
> > I think the proper fix would be to not rely on custom hooks into a particular
> > IOMMU driver, but to instead ensure that the amdgpu driver can do everything
> > it needs through the regular linux/iommu.h interfaces. I realize this
> > is more work,
> > but I wonder if you've tried that, and why it didn't work out.
> 
> As far as I know this hasn't been tried. I see that intel-iommu has its
> own SVM thing, which seems to be similar to what our IOMMUv2 does. I
> guess we'd have to abstract that into a common API.

The common API was added in 26b25a2b98e4 and implemented by the Intel
driver in 064a57d7ddfc. To support it an IOMMU driver implements new IOMMU
ops:
	.dev_has_feat()
	.dev_feat_enabled()
	.dev_enable_feat()
	.dev_disable_feat()
	.sva_bind()
	.sva_unbind()
	.sva_get_pasid()

And a device driver calls iommu_dev_enable_feature(IOMMU_DEV_FEAT_SVA)
followed by iommu_sva_bind_device().

If I remember correctly the biggest obstacle for KFD is the PASID
allocation, done by the GPU driver instead of the IOMMU driver, but there
may be others.

Thanks,
Jean
Alex Deucher March 9, 2021, 5:59 p.m. UTC | #4
On Tue, Mar 9, 2021 at 12:55 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> Hi Felix,
>
> On Tue, Mar 09, 2021 at 11:30:19AM -0500, Felix Kuehling wrote:
> > > I think the proper fix would be to not rely on custom hooks into a particular
> > > IOMMU driver, but to instead ensure that the amdgpu driver can do everything
> > > it needs through the regular linux/iommu.h interfaces. I realize this
> > > is more work,
> > > but I wonder if you've tried that, and why it didn't work out.
> >
> > As far as I know this hasn't been tried. I see that intel-iommu has its
> > own SVM thing, which seems to be similar to what our IOMMUv2 does. I
> > guess we'd have to abstract that into a common API.
>
> The common API was added in 26b25a2b98e4 and implemented by the Intel
> driver in 064a57d7ddfc. To support it an IOMMU driver implements new IOMMU
> ops:
>         .dev_has_feat()
>         .dev_feat_enabled()
>         .dev_enable_feat()
>         .dev_disable_feat()
>         .sva_bind()
>         .sva_unbind()
>         .sva_get_pasid()
>
> And a device driver calls iommu_dev_enable_feature(IOMMU_DEV_FEAT_SVA)
> followed by iommu_sva_bind_device().
>
> If I remember correctly the biggest obstacle for KFD is the PASID
> allocation, done by the GPU driver instead of the IOMMU driver, but there
> may be others.

IIRC, we tried to make the original IOMMUv2 functionality generic but
other vendors were not interested at the time, so it ended up being
AMD specific and since nothing else was using the pasid allocations we
put them in the GPU driver.  I guess if this is generic now, it could
be moved to a common API and taken out of the driver.

Alex
Christian König March 9, 2021, 6:34 p.m. UTC | #5
Am 09.03.21 um 18:59 schrieb Alex Deucher:
> On Tue, Mar 9, 2021 at 12:55 PM Jean-Philippe Brucker
> <jean-philippe@linaro.org> wrote:
>> Hi Felix,
>>
>> On Tue, Mar 09, 2021 at 11:30:19AM -0500, Felix Kuehling wrote:
>>>> I think the proper fix would be to not rely on custom hooks into a particular
>>>> IOMMU driver, but to instead ensure that the amdgpu driver can do everything
>>>> it needs through the regular linux/iommu.h interfaces. I realize this
>>>> is more work,
>>>> but I wonder if you've tried that, and why it didn't work out.
>>> As far as I know this hasn't been tried. I see that intel-iommu has its
>>> own SVM thing, which seems to be similar to what our IOMMUv2 does. I
>>> guess we'd have to abstract that into a common API.
>> The common API was added in 26b25a2b98e4 and implemented by the Intel
>> driver in 064a57d7ddfc. To support it an IOMMU driver implements new IOMMU
>> ops:
>>          .dev_has_feat()
>>          .dev_feat_enabled()
>>          .dev_enable_feat()
>>          .dev_disable_feat()
>>          .sva_bind()
>>          .sva_unbind()
>>          .sva_get_pasid()
>>
>> And a device driver calls iommu_dev_enable_feature(IOMMU_DEV_FEAT_SVA)
>> followed by iommu_sva_bind_device().
>>
>> If I remember correctly the biggest obstacle for KFD is the PASID
>> allocation, done by the GPU driver instead of the IOMMU driver, but there
>> may be others.
> IIRC, we tried to make the original IOMMUv2 functionality generic but
> other vendors were not interested at the time, so it ended up being
> AMD specific and since nothing else was using the pasid allocations we
> put them in the GPU driver.  I guess if this is generic now, it could
> be moved to a common API and taken out of the driver.

There has been quite some effort for this already for generic PASID 
interface etc.. But it looks like that effort is stalled by now.

Anyway at least I'm perfectly fine to have the IOMMUv2 || !IOMMUv2 
dependency on the core amdgpu driver for x86.

That should solve the build problem at hand quite nicely.

Regards,
Christian.

>
> Alex
Arnd Bergmann March 11, 2021, 10:02 a.m. UTC | #6
On Tue, Mar 9, 2021 at 7:34 PM Christian König <christian.koenig@amd.com> wrote:
> Am 09.03.21 um 18:59 schrieb Alex Deucher:
>
> There has been quite some effort for this already for generic PASID
> interface etc.. But it looks like that effort is stalled by now.
>
> Anyway at least I'm perfectly fine to have the IOMMUv2 || !IOMMUv2
> dependency on the core amdgpu driver for x86.
>
> That should solve the build problem at hand quite nicely.

Right, that sounds better than the IS_REACHABLE() hack, and would fix
the immediate build regression until the driver is changed to use the proper
IOMMU interfaces.

       Arnd
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
index 66bbca61e3ef..7199eb833f66 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -20,6 +20,10 @@ 
  * OTHER DEALINGS IN THE SOFTWARE.
  */
 
+#include <linux/kconfig.h>
+
+#if IS_REACHABLE(CONFIG_AMD_IOMMU_V2)
+
 #include <linux/printk.h>
 #include <linux/device.h>
 #include <linux/slab.h>
@@ -355,3 +359,9 @@  int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
 
 	return 0;
 }
+
+#else
+
+#warning "Moldular IOMMU-V2 is not usable by built-in KFD"
+
+#endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
index dd23d9fdf6a8..b25365fc2c4e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
@@ -23,7 +23,9 @@ 
 #ifndef __KFD_IOMMU_H__
 #define __KFD_IOMMU_H__
 
-#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
+#include <linux/kconfig.h>
+
+#if IS_REACHABLE(CONFIG_AMD_IOMMU_V2)
 
 #define KFD_SUPPORT_IOMMU_V2
 
@@ -73,6 +75,6 @@  static inline int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
 	return 0;
 }
 
-#endif /* defined(CONFIG_AMD_IOMMU_V2) */
+#endif /* IS_REACHABLE(CONFIG_AMD_IOMMU_V2) */
 
 #endif /* __KFD_IOMMU_H__ */