diff mbox series

[06/32] drm/amdgpu: add gfx9 hw debug mode enable and disable calls

Message ID 20230125195401.4183544-7-jonathan.kim@amd.com (mailing list archive)
State New, archived
Headers show
Series Upstream of kernel support for AMDGPU ISA debugging | expand

Commit Message

Kim, Jonathan Jan. 25, 2023, 7:53 p.m. UTC
Implement the per-device calls to enable or disable HW debug mode for
GFX9 prior to GFX9.4.1.

GFX9.4.1 and onward will require their own enable/disable sequence as
follow on patches.

When hardware debug mode setting is requested, waves will inherit
these settings in the Shader Processor Input's (SPI) Sequencer Global
Block (SQG). This means that the KGD must drain all waves from the SPI
into SQG (approximately 96 SPI clock cycles) prior to debug mode setting
to ensure that the order of operations that the debugger expects with
regards to debug mode setting transaction requests and wave inheritence
of that mode is upheld.

Also ensure that exception overrides are reset to their original state
prior to debug enable or disable.

v2: remove unnecessary static srbm lock renaming.
add comments to explain ignored arguments for debug trap enable and
disable.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 93 +++++++++++++++++++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  9 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h        |  3 +
 3 files changed, 105 insertions(+)

Comments

kernel test robot Jan. 29, 2023, 5:12 a.m. UTC | #1
Hi Jonathan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on drm-tip/drm-tip]
[also build test WARNING on linus/master v6.2-rc5]
[cannot apply to drm-misc/drm-misc-next drm/drm-next drm-exynos/exynos-drm-next drm-intel/for-linux-next drm-intel/for-linux-next-fixes]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jonathan-Kim/drm-amdkfd-add-debug-and-runtime-enable-interface/20230128-092952
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
patch link:    https://lore.kernel.org/r/20230125195401.4183544-7-jonathan.kim%40amd.com
patch subject: [PATCH 06/32] drm/amdgpu: add gfx9 hw debug mode enable and disable calls
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20230129/202301291257.PRqg0VpG-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/11bb8b2034cd92b687a2d5461298cc72d720d5c9
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Jonathan-Kim/drm-amdkfd-add-debug-and-runtime-enable-interface/20230128-092952
        git checkout 11bb8b2034cd92b687a2d5461298cc72d720d5c9
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/gpu/drm/amd/amdgpu/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:694: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * restore_dbg_reisters is ignored here but is a general interface requirement
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:718: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * keep_trap_enabled is ignored here but is a general interface requirement


vim +694 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c

   692	
   693	/**
 > 694	 * restore_dbg_reisters is ignored here but is a general interface requirement
   695	 * for devices that support GFXOFF and where the RLC save/restore list
   696	 * does not support hw registers for debugging i.e. the driver has to manually
   697	 * initialize the debug mode registers after it has disabled GFX off during the
   698	 * debug session.
   699	 */
   700	uint32_t kgd_gfx_v9_enable_debug_trap(struct amdgpu_device *adev,
   701					bool restore_dbg_registers,
   702					uint32_t vmid)
   703	{
   704		mutex_lock(&adev->grbm_idx_mutex);
   705	
   706		kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true);
   707	
   708		WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
   709	
   710		kgd_gfx_v9_set_wave_launch_stall(adev, vmid, false);
   711	
   712		mutex_unlock(&adev->grbm_idx_mutex);
   713	
   714		return 0;
   715	}
   716
Felix Kuehling Feb. 16, 2023, 10:54 p.m. UTC | #2
On 2023-01-25 14:53, Jonathan Kim wrote:
> Implement the per-device calls to enable or disable HW debug mode for
> GFX9 prior to GFX9.4.1.
>
> GFX9.4.1 and onward will require their own enable/disable sequence as
> follow on patches.
>
> When hardware debug mode setting is requested, waves will inherit
> these settings in the Shader Processor Input's (SPI) Sequencer Global
> Block (SQG). This means that the KGD must drain all waves from the SPI
> into SQG (approximately 96 SPI clock cycles) prior to debug mode setting
> to ensure that the order of operations that the debugger expects with
> regards to debug mode setting transaction requests and wave inheritence
> of that mode is upheld.
>
> Also ensure that exception overrides are reset to their original state
> prior to debug enable or disable.
>
> v2: remove unnecessary static srbm lock renaming.
> add comments to explain ignored arguments for debug trap enable and
> disable.
>
> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
> ---
> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 93 +++++++++++++++++++
> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 9 ++
> drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 3 +
> 3 files changed, 105 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index e92b93557c13..94a9fd9bd984 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> @@ -646,6 +646,97 @@ int kgd_gfx_v9_wave_control_execute(struct 
> amdgpu_device *adev,
> return 0;
> }
> +/*
> + * GFX9 helper for wave launch stall requirements on debug trap setting.
> + *
> + * vmid:
> + * Target VMID to stall/unstall.
> + *
> + * stall:
> + * 0-unstall wave launch (enable), 1-stall wave launch (disable).
> + * After wavefront launch has been stalled, allocated waves must 
> drain from
> + * SPI in order for debug trap settings to take effect on those waves.
> + * This is roughly a ~96 clock cycle wait on SPI where a read on
> + * SPI_GDBG_WAVE_CNTL translates to ~32 clock cycles.
> + * KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY indicates the number of 
> reads required.
> + *
> + * NOTE: We can afford to clear the entire STALL_VMID field on unstall
> + * because GFX9.4.1 cannot support multi-process debugging due to trap
> + * configuration and masking being limited to global scope. Always assume
> + * single process conditions.
> +
> + */
> +#define KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY 3
> +void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev,
> + uint32_t vmid,
> + bool stall)
> +{
> + int i;
> + uint32_t data = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
> +
> + if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 1))
> + data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_VMID,
> + stall ? 1 << vmid : 0);
> + else
> + data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_RA,
> + stall ? 1 : 0);
> +
> + WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL), data);
> +
> + if (!stall)
> + return;
> +
> + for (i = 0; i < KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY; i++)
> + RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
> +}
> +
> +/**

This was flagged by the kernel test robot. Should just be /* because 
it's not a formal doc comment.


> + * restore_dbg_reisters is ignored here but is a general interface 
> requirement

Typo: reisters -> registers


> + * for devices that support GFXOFF and where the RLC save/restore list
> + * does not support hw registers for debugging i.e. the driver has to 
> manually
> + * initialize the debug mode registers after it has disabled GFX off 
> during the
> + * debug session.
> + */
> +uint32_t kgd_gfx_v9_enable_debug_trap(struct amdgpu_device *adev,
> + bool restore_dbg_registers,
> + uint32_t vmid)
> +{
> + mutex_lock(&adev->grbm_idx_mutex);
> +
> + kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true);
> +
> + WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
> +
> + kgd_gfx_v9_set_wave_launch_stall(adev, vmid, false);
> +
> + mutex_unlock(&adev->grbm_idx_mutex);
> +
> + return 0;
> +}
> +
> +/**

Same as above. With those fixed, the patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> + * keep_trap_enabled is ignored here but is a general interface 
> requirement
> + * for devices that support multi-process debugging where the performance
> + * overhead from trap temporary setup needs to be bypassed when the debug
> + * session has ended.
> + */
> +uint32_t kgd_gfx_v9_disable_debug_trap(struct amdgpu_device *adev,
> + bool keep_trap_enabled,
> + uint32_t vmid)
> +{
> + mutex_lock(&adev->grbm_idx_mutex);
> +
> + kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true);
> +
> + WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
> +
> + kgd_gfx_v9_set_wave_launch_stall(adev, vmid, false);
> +
> + mutex_unlock(&adev->grbm_idx_mutex);
> +
> + return 0;
> +}
> +
> void kgd_gfx_v9_set_vm_context_page_table_base(struct amdgpu_device *adev,
> uint32_t vmid, uint64_t page_table_base)
> {
> @@ -871,6 +962,8 @@ const struct kfd2kgd_calls gfx_v9_kfd2kgd = {
> .get_atc_vmid_pasid_mapping_info =
> kgd_gfx_v9_get_atc_vmid_pasid_mapping_info,
> .set_vm_context_page_table_base = 
> kgd_gfx_v9_set_vm_context_page_table_base,
> + .enable_debug_trap = kgd_gfx_v9_enable_debug_trap,
> + .disable_debug_trap = kgd_gfx_v9_disable_debug_trap,
> .get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
> .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
> };
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
> index c7ed3bc9053c..d39256162616 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
> @@ -58,3 +58,12 @@ void kgd_gfx_v9_get_cu_occupancy(struct 
> amdgpu_device *adev, int pasid,
> int *pasid_wave_cnt, int *max_waves_per_cu);
> void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev,
> uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr);
> +void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev,
> + uint32_t vmid,
> + bool stall);
> +uint32_t kgd_gfx_v9_enable_debug_trap(struct amdgpu_device *adev,
> + bool restore_dbg_registers,
> + uint32_t vmid);
> +uint32_t kgd_gfx_v9_disable_debug_trap(struct amdgpu_device *adev,
> + bool keep_trap_enabled,
> + uint32_t vmid);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
> index b2217eb1399c..8aa7a3ad4e97 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
> @@ -25,6 +25,9 @@
> #include "kfd_priv.h"
> +void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev,
> + uint32_t vmid,
> + bool stall);
> int kfd_dbg_trap_disable(struct kfd_process *target);
> int kfd_dbg_trap_enable(struct kfd_process *target, uint32_t fd,
> void __user *runtime_info,
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index e92b93557c13..94a9fd9bd984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -646,6 +646,97 @@  int kgd_gfx_v9_wave_control_execute(struct amdgpu_device *adev,
 	return 0;
 }
 
+/*
+ * GFX9 helper for wave launch stall requirements on debug trap setting.
+ *
+ * vmid:
+ *   Target VMID to stall/unstall.
+ *
+ * stall:
+ *   0-unstall wave launch (enable), 1-stall wave launch (disable).
+ *   After wavefront launch has been stalled, allocated waves must drain from
+ *   SPI in order for debug trap settings to take effect on those waves.
+ *   This is roughly a ~96 clock cycle wait on SPI where a read on
+ *   SPI_GDBG_WAVE_CNTL translates to ~32 clock cycles.
+ *   KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY indicates the number of reads required.
+ *
+ *   NOTE: We can afford to clear the entire STALL_VMID field on unstall
+ *   because GFX9.4.1 cannot support multi-process debugging due to trap
+ *   configuration and masking being limited to global scope.  Always assume
+ *   single process conditions.
+
+ */
+#define KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY	3
+void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev,
+					uint32_t vmid,
+					bool stall)
+{
+	int i;
+	uint32_t data = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+
+	if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 1))
+		data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_VMID,
+							stall ? 1 << vmid : 0);
+	else
+		data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_RA,
+							stall ? 1 : 0);
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL), data);
+
+	if (!stall)
+		return;
+
+	for (i = 0; i < KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY; i++)
+		RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+}
+
+/**
+ * restore_dbg_reisters is ignored here but is a general interface requirement
+ * for devices that support GFXOFF and where the RLC save/restore list
+ * does not support hw registers for debugging i.e. the driver has to manually
+ * initialize the debug mode registers after it has disabled GFX off during the
+ * debug session.
+ */
+uint32_t kgd_gfx_v9_enable_debug_trap(struct amdgpu_device *adev,
+				bool restore_dbg_registers,
+				uint32_t vmid)
+{
+	mutex_lock(&adev->grbm_idx_mutex);
+
+	kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true);
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+	kgd_gfx_v9_set_wave_launch_stall(adev, vmid, false);
+
+	mutex_unlock(&adev->grbm_idx_mutex);
+
+	return 0;
+}
+
+/**
+ * keep_trap_enabled is ignored here but is a general interface requirement
+ * for devices that support multi-process debugging where the performance
+ * overhead from trap temporary setup needs to be bypassed when the debug
+ * session has ended.
+ */
+uint32_t kgd_gfx_v9_disable_debug_trap(struct amdgpu_device *adev,
+					bool keep_trap_enabled,
+					uint32_t vmid)
+{
+	mutex_lock(&adev->grbm_idx_mutex);
+
+	kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true);
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+	kgd_gfx_v9_set_wave_launch_stall(adev, vmid, false);
+
+	mutex_unlock(&adev->grbm_idx_mutex);
+
+	return 0;
+}
+
 void kgd_gfx_v9_set_vm_context_page_table_base(struct amdgpu_device *adev,
 			uint32_t vmid, uint64_t page_table_base)
 {
@@ -871,6 +962,8 @@  const struct kfd2kgd_calls gfx_v9_kfd2kgd = {
 	.get_atc_vmid_pasid_mapping_info =
 			kgd_gfx_v9_get_atc_vmid_pasid_mapping_info,
 	.set_vm_context_page_table_base = kgd_gfx_v9_set_vm_context_page_table_base,
+	.enable_debug_trap = kgd_gfx_v9_enable_debug_trap,
+	.disable_debug_trap = kgd_gfx_v9_disable_debug_trap,
 	.get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
 	.program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
index c7ed3bc9053c..d39256162616 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
@@ -58,3 +58,12 @@  void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device *adev, int pasid,
 		int *pasid_wave_cnt, int *max_waves_per_cu);
 void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev,
 		uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr);
+void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev,
+					uint32_t vmid,
+					bool stall);
+uint32_t kgd_gfx_v9_enable_debug_trap(struct amdgpu_device *adev,
+				      bool restore_dbg_registers,
+				      uint32_t vmid);
+uint32_t kgd_gfx_v9_disable_debug_trap(struct amdgpu_device *adev,
+					bool keep_trap_enabled,
+					uint32_t vmid);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index b2217eb1399c..8aa7a3ad4e97 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -25,6 +25,9 @@ 
 
 #include "kfd_priv.h"
 
+void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev,
+					uint32_t vmid,
+					bool stall);
 int kfd_dbg_trap_disable(struct kfd_process *target);
 int kfd_dbg_trap_enable(struct kfd_process *target, uint32_t fd,
 			void __user *runtime_info,