drm/msm/a6xx: Skip gpu secure fw load in EL2 mode

Message ID	20241209-drm-msm-kvm-support-v1-1-1c983a8a8087@quicinc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Akhil P Oommen <quic_akhilpo@quicinc.com> Date: Mon, 9 Dec 2024 13:49:15 +0530 Subject: [PATCH] drm/msm/a6xx: Skip gpu secure fw load in EL2 mode MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-ID: <20241209-drm-msm-kvm-support-v1-1-1c983a8a8087@quicinc.com> To: Rob Clark <robdclark@gmail.com>, Sean Paul <sean@poorly.run>, "Konrad Dybcio" <konradybcio@kernel.org>, Abhinav Kumar <quic_abhinavk@quicinc.com>, Dmitry Baryshkov <dmitry.baryshkov@linaro.org>, Marijn Suijten <marijn.suijten@somainline.org>, David Airlie <airlied@gmail.com>, "Simona Vetter" <simona@ffwll.ch>, Elliot Berman <quic_eberman@quicinc.com>, "Pavan Kondeti" <quic_pkondeti@quicinc.com> CC: <linux-arm-msm@vger.kernel.org>, <dri-devel@lists.freedesktop.org>, <freedreno@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, Akhil P Oommen <quic_akhilpo@quicinc.com> Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/msm/a6xx: Skip gpu secure fw load in EL2 mode \| expand drm/msm/a6xx: Skip gpu secure fw load in EL2 mode

Akhil P Oommen Dec. 9, 2024, 8:19 a.m. UTC

When kernel is booted in EL2, SECVID registers are accessible to the
KMD. So we can use that to switch GPU's secure mode to avoid dependency
on Zap firmware. Also, we can't load a secure firmware without a
hypervisor that supports it.

Tested following configurations on sa8775p chipset (Adreno 663 gpu):

1. Gunyah (No KVM) - Loads zap shader based on DT
2. KVM in VHE - Skips zap shader load and programs SECVID register
3. KVM in nVHE - Loads zap shader based on DT
4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
	programs SECVID register

For (1) and (3) configuration, this patch doesn't have any impact.
Driver loads secure firmware based on other existing hints.

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
---
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 82 +++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 28 deletions(-)


---
base-commit: f4a867a46862c1743501bbe8c813238456ec8699
change-id: 20241120-drm-msm-kvm-support-cd6e6744ced6

Best regards,

Konrad Dybcio Dec. 9, 2024, 3:03 p.m. UTC | #1

On 9.12.2024 9:19 AM, Akhil P Oommen wrote:
> When kernel is booted in EL2, SECVID registers are accessible to the
> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> on Zap firmware. Also, we can't load a secure firmware without a
> hypervisor that supports it.
> 
> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
> 
> 1. Gunyah (No KVM) - Loads zap shader based on DT
> 2. KVM in VHE - Skips zap shader load and programs SECVID register
> 3. KVM in nVHE - Loads zap shader based on DT
> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
> 	programs SECVID register
> 
> For (1) and (3) configuration, this patch doesn't have any impact.
> Driver loads secure firmware based on other existing hints.
> 
> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
> ---

[...]

> +
> +#ifdef CONFIG_ARM64
> +	/*
> +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> +	 * to switch the secure mode to avoid the dependency on zap shader.
> +	 */
> +	if (is_kernel_in_hyp_mode())
> +		goto direct_switch;

So I suppose this would ideally be like hv_is_hyperv_initialized()
but for QHEE/Gunyah, which is not going to happen, as we have
millions of devices with old unupstreamable-ABI-Gunyah running..

This looks like the next best things then, so no objections, but..

[...]

> +	ret = a6xx_switch_secure_mode(gpu);
> +	if (!ret)

this should definitely be a if (ret)

Konrad

Rob Clark Dec. 9, 2024, 7:54 p.m. UTC | #2

On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> When kernel is booted in EL2, SECVID registers are accessible to the
> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> on Zap firmware. Also, we can't load a secure firmware without a
> hypervisor that supports it.

Shouldn't we do this based on whether zap node is in dtb (and not disabled)?

slbounce applies some dtb overlays to disable the zap node when
booting in EL2 (and make some other changes due to kernel being in
control of the pci smmuv3, or something along those lines).

BR,
-R

>
> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
>
> 1. Gunyah (No KVM) - Loads zap shader based on DT
> 2. KVM in VHE - Skips zap shader load and programs SECVID register
> 3. KVM in nVHE - Loads zap shader based on DT
> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
>         programs SECVID register
>
> For (1) and (3) configuration, this patch doesn't have any impact.
> Driver loads secure firmware based on other existing hints.
>
> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
> ---
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 82 +++++++++++++++++++++++------------
>  1 file changed, 54 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 019610341df1..9dcaa8472430 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -14,6 +14,10 @@
>  #include <linux/pm_domain.h>
>  #include <linux/soc/qcom/llcc-qcom.h>
>
> +#ifdef CONFIG_ARM64
> +#include <asm/virt.h>
> +#endif
> +
>  #define GPU_PAS_ID 13
>
>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> @@ -998,6 +1002,54 @@ static int a6xx_zap_shader_init(struct msm_gpu *gpu)
>         return ret;
>  }
>
> +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> +{
> +       int ret;
> +
> +#ifdef CONFIG_ARM64
> +       /*
> +        * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> +        * to switch the secure mode to avoid the dependency on zap shader.
> +        */
> +       if (is_kernel_in_hyp_mode())
> +               goto direct_switch;
> +#endif
> +
> +       /*
> +        * Try to load a zap shader into the secure world. If successful
> +        * we can use the CP to switch out of secure mode. If not then we
> +        * have no resource but to try to switch ourselves out manually. If we
> +        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
> +        * be blocked and a permissions violation will soon follow.
> +        */
> +       ret = a6xx_zap_shader_init(gpu);
> +       if (ret == -ENODEV) {
> +               /*
> +                * This device does not use zap shader (but print a warning
> +                * just in case someone got their dt wrong.. hopefully they
> +                * have a debug UART to realize the error of their ways...
> +                * if you mess this up you are about to crash horribly)
> +                */
> +               dev_warn_once(gpu->dev->dev,
> +                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
> +               goto direct_switch;
> +       } else if (ret)
> +               return ret;
> +
> +       OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
> +       OUT_RING(gpu->rb[0], 0x00000000);
> +
> +       a6xx_flush(gpu, gpu->rb[0]);
> +       if (!a6xx_idle(gpu, gpu->rb[0]))
> +               return -EINVAL;
> +
> +       return 0;
> +
> +direct_switch:
> +       gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
> +       return 0;
> +}
> +
>  #define A6XX_INT_MASK (A6XX_RBBM_INT_0_MASK_CP_AHB_ERROR | \
>                        A6XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNCFIFO_OVERFLOW | \
>                        A6XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
> @@ -1341,35 +1393,9 @@ static int hw_init(struct msm_gpu *gpu)
>         if (ret)
>                 goto out;
>
> -       /*
> -        * Try to load a zap shader into the secure world. If successful
> -        * we can use the CP to switch out of secure mode. If not then we
> -        * have no resource but to try to switch ourselves out manually. If we
> -        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
> -        * be blocked and a permissions violation will soon follow.
> -        */
> -       ret = a6xx_zap_shader_init(gpu);
> -       if (!ret) {
> -               OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
> -               OUT_RING(gpu->rb[0], 0x00000000);
> -
> -               a6xx_flush(gpu, gpu->rb[0]);
> -               if (!a6xx_idle(gpu, gpu->rb[0]))
> -                       return -EINVAL;
> -       } else if (ret == -ENODEV) {
> -               /*
> -                * This device does not use zap shader (but print a warning
> -                * just in case someone got their dt wrong.. hopefully they
> -                * have a debug UART to realize the error of their ways...
> -                * if you mess this up you are about to crash horribly)
> -                */
> -               dev_warn_once(gpu->dev->dev,
> -                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
> -               gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
> -               ret = 0;
> -       } else {
> +       ret = a6xx_switch_secure_mode(gpu);
> +       if (!ret)
>                 return ret;
> -       }
>
>  out:
>         if (adreno_has_gmu_wrapper(adreno_gpu))
>
> ---
> base-commit: f4a867a46862c1743501bbe8c813238456ec8699
> change-id: 20241120-drm-msm-kvm-support-cd6e6744ced6
>
> Best regards,
> --
> Akhil P Oommen <quic_akhilpo@quicinc.com>
>

Akhil P Oommen Dec. 9, 2024, 8:52 p.m. UTC | #3

On 12/10/2024 1:24 AM, Rob Clark wrote:
> On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>>
>> When kernel is booted in EL2, SECVID registers are accessible to the
>> KMD. So we can use that to switch GPU's secure mode to avoid dependency
>> on Zap firmware. Also, we can't load a secure firmware without a
>> hypervisor that supports it.
> 
> Shouldn't we do this based on whether zap node is in dtb (and not disabled)?

This is better, isn't it? Otherwise, multiple overlays should be
maintained for each soc/board since EL2 can be toggled from bootloader.
And this feature is likely going to be more widely available.

-Akhil.

> 
> slbounce applies some dtb overlays to disable the zap node when
> booting in EL2 (and make some other changes due to kernel being in
> control of the pci smmuv3, or something along those lines).
> 
> BR,
> -R
> 
>>
>> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
>>
>> 1. Gunyah (No KVM) - Loads zap shader based on DT
>> 2. KVM in VHE - Skips zap shader load and programs SECVID register
>> 3. KVM in nVHE - Loads zap shader based on DT
>> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
>>         programs SECVID register
>>
>> For (1) and (3) configuration, this patch doesn't have any impact.
>> Driver loads secure firmware based on other existing hints.
>>
>> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
>> ---
>> ---
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 82 +++++++++++++++++++++++------------
>>  1 file changed, 54 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index 019610341df1..9dcaa8472430 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -14,6 +14,10 @@
>>  #include <linux/pm_domain.h>
>>  #include <linux/soc/qcom/llcc-qcom.h>
>>
>> +#ifdef CONFIG_ARM64
>> +#include <asm/virt.h>
>> +#endif
>> +
>>  #define GPU_PAS_ID 13
>>
>>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>> @@ -998,6 +1002,54 @@ static int a6xx_zap_shader_init(struct msm_gpu *gpu)
>>         return ret;
>>  }
>>
>> +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
>> +{
>> +       int ret;
>> +
>> +#ifdef CONFIG_ARM64
>> +       /*
>> +        * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
>> +        * to switch the secure mode to avoid the dependency on zap shader.
>> +        */
>> +       if (is_kernel_in_hyp_mode())
>> +               goto direct_switch;
>> +#endif
>> +
>> +       /*
>> +        * Try to load a zap shader into the secure world. If successful
>> +        * we can use the CP to switch out of secure mode. If not then we
>> +        * have no resource but to try to switch ourselves out manually. If we
>> +        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
>> +        * be blocked and a permissions violation will soon follow.
>> +        */
>> +       ret = a6xx_zap_shader_init(gpu);
>> +       if (ret == -ENODEV) {
>> +               /*
>> +                * This device does not use zap shader (but print a warning
>> +                * just in case someone got their dt wrong.. hopefully they
>> +                * have a debug UART to realize the error of their ways...
>> +                * if you mess this up you are about to crash horribly)
>> +                */
>> +               dev_warn_once(gpu->dev->dev,
>> +                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
>> +               goto direct_switch;
>> +       } else if (ret)
>> +               return ret;
>> +
>> +       OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
>> +       OUT_RING(gpu->rb[0], 0x00000000);
>> +
>> +       a6xx_flush(gpu, gpu->rb[0]);
>> +       if (!a6xx_idle(gpu, gpu->rb[0]))
>> +               return -EINVAL;
>> +
>> +       return 0;
>> +
>> +direct_switch:
>> +       gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
>> +       return 0;
>> +}
>> +
>>  #define A6XX_INT_MASK (A6XX_RBBM_INT_0_MASK_CP_AHB_ERROR | \
>>                        A6XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNCFIFO_OVERFLOW | \
>>                        A6XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
>> @@ -1341,35 +1393,9 @@ static int hw_init(struct msm_gpu *gpu)
>>         if (ret)
>>                 goto out;
>>
>> -       /*
>> -        * Try to load a zap shader into the secure world. If successful
>> -        * we can use the CP to switch out of secure mode. If not then we
>> -        * have no resource but to try to switch ourselves out manually. If we
>> -        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
>> -        * be blocked and a permissions violation will soon follow.
>> -        */
>> -       ret = a6xx_zap_shader_init(gpu);
>> -       if (!ret) {
>> -               OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
>> -               OUT_RING(gpu->rb[0], 0x00000000);
>> -
>> -               a6xx_flush(gpu, gpu->rb[0]);
>> -               if (!a6xx_idle(gpu, gpu->rb[0]))
>> -                       return -EINVAL;
>> -       } else if (ret == -ENODEV) {
>> -               /*
>> -                * This device does not use zap shader (but print a warning
>> -                * just in case someone got their dt wrong.. hopefully they
>> -                * have a debug UART to realize the error of their ways...
>> -                * if you mess this up you are about to crash horribly)
>> -                */
>> -               dev_warn_once(gpu->dev->dev,
>> -                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
>> -               gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
>> -               ret = 0;
>> -       } else {
>> +       ret = a6xx_switch_secure_mode(gpu);
>> +       if (!ret)
>>                 return ret;
>> -       }
>>
>>  out:
>>         if (adreno_has_gmu_wrapper(adreno_gpu))
>>
>> ---
>> base-commit: f4a867a46862c1743501bbe8c813238456ec8699
>> change-id: 20241120-drm-msm-kvm-support-cd6e6744ced6
>>
>> Best regards,
>> --
>> Akhil P Oommen <quic_akhilpo@quicinc.com>
>>

Akhil P Oommen Dec. 9, 2024, 8:54 p.m. UTC | #4

On 12/9/2024 8:33 PM, Konrad Dybcio wrote:
> On 9.12.2024 9:19 AM, Akhil P Oommen wrote:
>> When kernel is booted in EL2, SECVID registers are accessible to the
>> KMD. So we can use that to switch GPU's secure mode to avoid dependency
>> on Zap firmware. Also, we can't load a secure firmware without a
>> hypervisor that supports it.
>>
>> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
>>
>> 1. Gunyah (No KVM) - Loads zap shader based on DT
>> 2. KVM in VHE - Skips zap shader load and programs SECVID register
>> 3. KVM in nVHE - Loads zap shader based on DT
>> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
>> 	programs SECVID register
>>
>> For (1) and (3) configuration, this patch doesn't have any impact.
>> Driver loads secure firmware based on other existing hints.
>>
>> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
>> ---
> 
> [...]
> 
>> +
>> +#ifdef CONFIG_ARM64
>> +	/*
>> +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
>> +	 * to switch the secure mode to avoid the dependency on zap shader.
>> +	 */
>> +	if (is_kernel_in_hyp_mode())
>> +		goto direct_switch;
> 
> So I suppose this would ideally be like hv_is_hyperv_initialized()
> but for QHEE/Gunyah, which is not going to happen, as we have
> millions of devices with old unupstreamable-ABI-Gunyah running..
> 
> This looks like the next best things then, so no objections, but..
> 
> [...]
> 
>> +	ret = a6xx_switch_secure_mode(gpu);
>> +	if (!ret)
> 
> this should definitely be a if (ret)

My bad! Thanks.

-Akhil.
> 
> Konrad

Rob Clark Dec. 9, 2024, 9:56 p.m. UTC | #5

On Mon, Dec 9, 2024 at 12:52 PM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On 12/10/2024 1:24 AM, Rob Clark wrote:
> > On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> >>
> >> When kernel is booted in EL2, SECVID registers are accessible to the
> >> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> >> on Zap firmware. Also, we can't load a secure firmware without a
> >> hypervisor that supports it.
> >
> > Shouldn't we do this based on whether zap node is in dtb (and not disabled)?
>
> This is better, isn't it? Otherwise, multiple overlays should be
> maintained for each soc/board since EL2 can be toggled from bootloader.
> And this feature is likely going to be more widely available.

I guess the first question is what the dt should look like.  I think
it makes sense to not have a zap node when booting in EL2 (or at least
disabling it) because that describes the hw+fw situation.  And in any
case, so far it seems like we often need unrelated changes[1].  But
maybe others have differing opinions.

And depending on how much cooperation we get from the bootloader, it
could be that our hand is forced.  I figured I should at least point
out how we currently handle this.

A further point, I suppose it is in theory possible that a device
could have no secure playback support, despite booting in EL1?  So
tying this to EL2 seems a bit contrived.

BR,
-R

[1] https://github.com/TravMurav/slbounce/blob/main/dtbo/x1e-el2.dtso

> -Akhil.
>
> >
> > slbounce applies some dtb overlays to disable the zap node when
> > booting in EL2 (and make some other changes due to kernel being in
> > control of the pci smmuv3, or something along those lines).
> >
> > BR,
> > -R
> >
> >>
> >> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
> >>
> >> 1. Gunyah (No KVM) - Loads zap shader based on DT
> >> 2. KVM in VHE - Skips zap shader load and programs SECVID register
> >> 3. KVM in nVHE - Loads zap shader based on DT
> >> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
> >>         programs SECVID register
> >>
> >> For (1) and (3) configuration, this patch doesn't have any impact.
> >> Driver loads secure firmware based on other existing hints.
> >>
> >> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
> >> ---
> >> ---
> >>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 82 +++++++++++++++++++++++------------
> >>  1 file changed, 54 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> index 019610341df1..9dcaa8472430 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> @@ -14,6 +14,10 @@
> >>  #include <linux/pm_domain.h>
> >>  #include <linux/soc/qcom/llcc-qcom.h>
> >>
> >> +#ifdef CONFIG_ARM64
> >> +#include <asm/virt.h>
> >> +#endif
> >> +
> >>  #define GPU_PAS_ID 13
> >>
> >>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> >> @@ -998,6 +1002,54 @@ static int a6xx_zap_shader_init(struct msm_gpu *gpu)
> >>         return ret;
> >>  }
> >>
> >> +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> >> +{
> >> +       int ret;
> >> +
> >> +#ifdef CONFIG_ARM64
> >> +       /*
> >> +        * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> >> +        * to switch the secure mode to avoid the dependency on zap shader.
> >> +        */
> >> +       if (is_kernel_in_hyp_mode())
> >> +               goto direct_switch;
> >> +#endif
> >> +
> >> +       /*
> >> +        * Try to load a zap shader into the secure world. If successful
> >> +        * we can use the CP to switch out of secure mode. If not then we
> >> +        * have no resource but to try to switch ourselves out manually. If we
> >> +        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
> >> +        * be blocked and a permissions violation will soon follow.
> >> +        */
> >> +       ret = a6xx_zap_shader_init(gpu);
> >> +       if (ret == -ENODEV) {
> >> +               /*
> >> +                * This device does not use zap shader (but print a warning
> >> +                * just in case someone got their dt wrong.. hopefully they
> >> +                * have a debug UART to realize the error of their ways...
> >> +                * if you mess this up you are about to crash horribly)
> >> +                */
> >> +               dev_warn_once(gpu->dev->dev,
> >> +                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
> >> +               goto direct_switch;
> >> +       } else if (ret)
> >> +               return ret;
> >> +
> >> +       OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
> >> +       OUT_RING(gpu->rb[0], 0x00000000);
> >> +
> >> +       a6xx_flush(gpu, gpu->rb[0]);
> >> +       if (!a6xx_idle(gpu, gpu->rb[0]))
> >> +               return -EINVAL;
> >> +
> >> +       return 0;
> >> +
> >> +direct_switch:
> >> +       gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
> >> +       return 0;
> >> +}
> >> +
> >>  #define A6XX_INT_MASK (A6XX_RBBM_INT_0_MASK_CP_AHB_ERROR | \
> >>                        A6XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNCFIFO_OVERFLOW | \
> >>                        A6XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
> >> @@ -1341,35 +1393,9 @@ static int hw_init(struct msm_gpu *gpu)
> >>         if (ret)
> >>                 goto out;
> >>
> >> -       /*
> >> -        * Try to load a zap shader into the secure world. If successful
> >> -        * we can use the CP to switch out of secure mode. If not then we
> >> -        * have no resource but to try to switch ourselves out manually. If we
> >> -        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
> >> -        * be blocked and a permissions violation will soon follow.
> >> -        */
> >> -       ret = a6xx_zap_shader_init(gpu);
> >> -       if (!ret) {
> >> -               OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
> >> -               OUT_RING(gpu->rb[0], 0x00000000);
> >> -
> >> -               a6xx_flush(gpu, gpu->rb[0]);
> >> -               if (!a6xx_idle(gpu, gpu->rb[0]))
> >> -                       return -EINVAL;
> >> -       } else if (ret == -ENODEV) {
> >> -               /*
> >> -                * This device does not use zap shader (but print a warning
> >> -                * just in case someone got their dt wrong.. hopefully they
> >> -                * have a debug UART to realize the error of their ways...
> >> -                * if you mess this up you are about to crash horribly)
> >> -                */
> >> -               dev_warn_once(gpu->dev->dev,
> >> -                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
> >> -               gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
> >> -               ret = 0;
> >> -       } else {
> >> +       ret = a6xx_switch_secure_mode(gpu);
> >> +       if (!ret)
> >>                 return ret;
> >> -       }
> >>
> >>  out:
> >>         if (adreno_has_gmu_wrapper(adreno_gpu))
> >>
> >> ---
> >> base-commit: f4a867a46862c1743501bbe8c813238456ec8699
> >> change-id: 20241120-drm-msm-kvm-support-cd6e6744ced6
> >>
> >> Best regards,
> >> --
> >> Akhil P Oommen <quic_akhilpo@quicinc.com>
> >>
>

Akhil P Oommen Dec. 10, 2024, 9:13 a.m. UTC | #6

On 12/10/2024 3:26 AM, Rob Clark wrote:
> On Mon, Dec 9, 2024 at 12:52 PM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>>
>> On 12/10/2024 1:24 AM, Rob Clark wrote:
>>> On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>>>>
>>>> When kernel is booted in EL2, SECVID registers are accessible to the
>>>> KMD. So we can use that to switch GPU's secure mode to avoid dependency
>>>> on Zap firmware. Also, we can't load a secure firmware without a
>>>> hypervisor that supports it.
>>>
>>> Shouldn't we do this based on whether zap node is in dtb (and not disabled)?
>>
>> This is better, isn't it? Otherwise, multiple overlays should be
>> maintained for each soc/board since EL2 can be toggled from bootloader.
>> And this feature is likely going to be more widely available.
> 
> I guess the first question is what the dt should look like.  I think
> it makes sense to not have a zap node when booting in EL2 (or at least
> disabling it) because that describes the hw+fw situation.  And in any
> case, so far it seems like we often need unrelated changes[1].  But
> maybe others have differing opinions.
> 
> And depending on how much cooperation we get from the bootloader, it
> could be that our hand is forced.  I figured I should at least point
> out how we currently handle this.

At the moment, the bootloader we use on sa8775p doesn't have overlay
support. So I felt we could free GPU from that requirement and get it
enabled independently. At least to get the basic things working *out of
the box*. We have Display, GPU and Guest VM working with no extra changes.

> 
> A further point, I suppose it is in theory possible that a device
> could have no secure playback support, despite booting in EL1?  So
> tying this to EL2 seems a bit contrived.
> 

That is correct. But not sure how widely that configuration would be
used practically. OTOH, Kernel running at EL2 in qcom chipsets is going
to be more wider in usage. This is showing up as a common requirement
for non-handset chipsets, especially IoT. So a special case here in our
driver doesn't seem bad to me. Let software work out of the box, instead
of "disable GPU by default". ;)

-Akhil.

> BR,
> -R
> 
> [1] https://github.com/TravMurav/slbounce/blob/main/dtbo/x1e-el2.dtso
> 
>> -Akhil.
>>
>>>
>>> slbounce applies some dtb overlays to disable the zap node when
>>> booting in EL2 (and make some other changes due to kernel being in
>>> control of the pci smmuv3, or something along those lines).
>>>
>>> BR,
>>> -R
>>>
>>>>
>>>> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
>>>>
>>>> 1. Gunyah (No KVM) - Loads zap shader based on DT
>>>> 2. KVM in VHE - Skips zap shader load and programs SECVID register
>>>> 3. KVM in nVHE - Loads zap shader based on DT
>>>> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
>>>>         programs SECVID register
>>>>
>>>> For (1) and (3) configuration, this patch doesn't have any impact.
>>>> Driver loads secure firmware based on other existing hints.
>>>>
>>>> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
>>>> ---
>>>> ---
>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 82 +++++++++++++++++++++++------------
>>>>  1 file changed, 54 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> index 019610341df1..9dcaa8472430 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> @@ -14,6 +14,10 @@
>>>>  #include <linux/pm_domain.h>
>>>>  #include <linux/soc/qcom/llcc-qcom.h>
>>>>
>>>> +#ifdef CONFIG_ARM64
>>>> +#include <asm/virt.h>
>>>> +#endif
>>>> +
>>>>  #define GPU_PAS_ID 13
>>>>
>>>>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>>>> @@ -998,6 +1002,54 @@ static int a6xx_zap_shader_init(struct msm_gpu *gpu)
>>>>         return ret;
>>>>  }
>>>>
>>>> +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
>>>> +{
>>>> +       int ret;
>>>> +
>>>> +#ifdef CONFIG_ARM64
>>>> +       /*
>>>> +        * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
>>>> +        * to switch the secure mode to avoid the dependency on zap shader.
>>>> +        */
>>>> +       if (is_kernel_in_hyp_mode())
>>>> +               goto direct_switch;
>>>> +#endif
>>>> +
>>>> +       /*
>>>> +        * Try to load a zap shader into the secure world. If successful
>>>> +        * we can use the CP to switch out of secure mode. If not then we
>>>> +        * have no resource but to try to switch ourselves out manually. If we
>>>> +        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
>>>> +        * be blocked and a permissions violation will soon follow.
>>>> +        */
>>>> +       ret = a6xx_zap_shader_init(gpu);
>>>> +       if (ret == -ENODEV) {
>>>> +               /*
>>>> +                * This device does not use zap shader (but print a warning
>>>> +                * just in case someone got their dt wrong.. hopefully they
>>>> +                * have a debug UART to realize the error of their ways...
>>>> +                * if you mess this up you are about to crash horribly)
>>>> +                */
>>>> +               dev_warn_once(gpu->dev->dev,
>>>> +                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
>>>> +               goto direct_switch;
>>>> +       } else if (ret)
>>>> +               return ret;
>>>> +
>>>> +       OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
>>>> +       OUT_RING(gpu->rb[0], 0x00000000);
>>>> +
>>>> +       a6xx_flush(gpu, gpu->rb[0]);
>>>> +       if (!a6xx_idle(gpu, gpu->rb[0]))
>>>> +               return -EINVAL;
>>>> +
>>>> +       return 0;
>>>> +
>>>> +direct_switch:
>>>> +       gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
>>>> +       return 0;
>>>> +}
>>>> +
>>>>  #define A6XX_INT_MASK (A6XX_RBBM_INT_0_MASK_CP_AHB_ERROR | \
>>>>                        A6XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNCFIFO_OVERFLOW | \
>>>>                        A6XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
>>>> @@ -1341,35 +1393,9 @@ static int hw_init(struct msm_gpu *gpu)
>>>>         if (ret)
>>>>                 goto out;
>>>>
>>>> -       /*
>>>> -        * Try to load a zap shader into the secure world. If successful
>>>> -        * we can use the CP to switch out of secure mode. If not then we
>>>> -        * have no resource but to try to switch ourselves out manually. If we
>>>> -        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
>>>> -        * be blocked and a permissions violation will soon follow.
>>>> -        */
>>>> -       ret = a6xx_zap_shader_init(gpu);
>>>> -       if (!ret) {
>>>> -               OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
>>>> -               OUT_RING(gpu->rb[0], 0x00000000);
>>>> -
>>>> -               a6xx_flush(gpu, gpu->rb[0]);
>>>> -               if (!a6xx_idle(gpu, gpu->rb[0]))
>>>> -                       return -EINVAL;
>>>> -       } else if (ret == -ENODEV) {
>>>> -               /*
>>>> -                * This device does not use zap shader (but print a warning
>>>> -                * just in case someone got their dt wrong.. hopefully they
>>>> -                * have a debug UART to realize the error of their ways...
>>>> -                * if you mess this up you are about to crash horribly)
>>>> -                */
>>>> -               dev_warn_once(gpu->dev->dev,
>>>> -                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
>>>> -               gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
>>>> -               ret = 0;
>>>> -       } else {
>>>> +       ret = a6xx_switch_secure_mode(gpu);
>>>> +       if (!ret)
>>>>                 return ret;
>>>> -       }
>>>>
>>>>  out:
>>>>         if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>
>>>> ---
>>>> base-commit: f4a867a46862c1743501bbe8c813238456ec8699
>>>> change-id: 20241120-drm-msm-kvm-support-cd6e6744ced6
>>>>
>>>> Best regards,
>>>> --
>>>> Akhil P Oommen <quic_akhilpo@quicinc.com>
>>>>
>>

Elliot Berman Dec. 10, 2024, 8:54 p.m. UTC | #7

On Mon, Dec 09, 2024 at 01:49:15PM +0530, Akhil P Oommen wrote:
> When kernel is booted in EL2, SECVID registers are accessible to the
> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> on Zap firmware. Also, we can't load a secure firmware without a
> hypervisor that supports it.
> 
> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
> 
> 1. Gunyah (No KVM) - Loads zap shader based on DT
> 2. KVM in VHE - Skips zap shader load and programs SECVID register
> 3. KVM in nVHE - Loads zap shader based on DT

I think this might be misleading. As I understand, KVM in nVHE doesn't
support loading secure firmware. I'm not aware of any support added to
make it work. So, the driver will try to load zap shader and it fails
same as it does today.

> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
> 	programs SECVID register
> 
> For (1) and (3) configuration, this patch doesn't have any impact.
> Driver loads secure firmware based on other existing hints.
> 
> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
> ---
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 82 +++++++++++++++++++++++------------
>  1 file changed, 54 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 019610341df1..9dcaa8472430 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -14,6 +14,10 @@
>  #include <linux/pm_domain.h>
>  #include <linux/soc/qcom/llcc-qcom.h>
>  
> +#ifdef CONFIG_ARM64
> +#include <asm/virt.h>
> +#endif
> +
>  #define GPU_PAS_ID 13
>  
>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> @@ -998,6 +1002,54 @@ static int a6xx_zap_shader_init(struct msm_gpu *gpu)
>  	return ret;
>  }
>  
> +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> +{
> +	int ret;
> +
> +#ifdef CONFIG_ARM64
> +	/*
> +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> +	 * to switch the secure mode to avoid the dependency on zap shader.
> +	 */
> +	if (is_kernel_in_hyp_mode())
> +		goto direct_switch;
> +#endif
> +
> +	/*
> +	 * Try to load a zap shader into the secure world. If successful
> +	 * we can use the CP to switch out of secure mode. If not then we
> +	 * have no resource but to try to switch ourselves out manually. If we
> +	 * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
> +	 * be blocked and a permissions violation will soon follow.
> +	 */
> +	ret = a6xx_zap_shader_init(gpu);
> +	if (ret == -ENODEV) {
> +		/*
> +		 * This device does not use zap shader (but print a warning
> +		 * just in case someone got their dt wrong.. hopefully they
> +		 * have a debug UART to realize the error of their ways...
> +		 * if you mess this up you are about to crash horribly)
> +		 */
> +		dev_warn_once(gpu->dev->dev,
> +			"Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
> +		goto direct_switch;
> +	} else if (ret)
> +		return ret;
> +
> +	OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
> +	OUT_RING(gpu->rb[0], 0x00000000);
> +
> +	a6xx_flush(gpu, gpu->rb[0]);
> +	if (!a6xx_idle(gpu, gpu->rb[0]))
> +		return -EINVAL;
> +
> +	return 0;
> +
> +direct_switch:
> +	gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
> +	return 0;
> +}
> +
>  #define A6XX_INT_MASK (A6XX_RBBM_INT_0_MASK_CP_AHB_ERROR | \
>  		       A6XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNCFIFO_OVERFLOW | \
>  		       A6XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
> @@ -1341,35 +1393,9 @@ static int hw_init(struct msm_gpu *gpu)
>  	if (ret)
>  		goto out;
>  
> -	/*
> -	 * Try to load a zap shader into the secure world. If successful
> -	 * we can use the CP to switch out of secure mode. If not then we
> -	 * have no resource but to try to switch ourselves out manually. If we
> -	 * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
> -	 * be blocked and a permissions violation will soon follow.
> -	 */
> -	ret = a6xx_zap_shader_init(gpu);
> -	if (!ret) {
> -		OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
> -		OUT_RING(gpu->rb[0], 0x00000000);
> -
> -		a6xx_flush(gpu, gpu->rb[0]);
> -		if (!a6xx_idle(gpu, gpu->rb[0]))
> -			return -EINVAL;
> -	} else if (ret == -ENODEV) {
> -		/*
> -		 * This device does not use zap shader (but print a warning
> -		 * just in case someone got their dt wrong.. hopefully they
> -		 * have a debug UART to realize the error of their ways...
> -		 * if you mess this up you are about to crash horribly)
> -		 */
> -		dev_warn_once(gpu->dev->dev,
> -			"Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
> -		gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
> -		ret = 0;
> -	} else {
> +	ret = a6xx_switch_secure_mode(gpu);
> +	if (!ret)
>  		return ret;
> -	}
>  
>  out:
>  	if (adreno_has_gmu_wrapper(adreno_gpu))
> 
> ---
> base-commit: f4a867a46862c1743501bbe8c813238456ec8699
> change-id: 20241120-drm-msm-kvm-support-cd6e6744ced6
> 
> Best regards,
> -- 
> Akhil P Oommen <quic_akhilpo@quicinc.com>
>

Marc Zyngier Dec. 10, 2024, 9:24 p.m. UTC | #8

On Mon, 09 Dec 2024 08:19:15 +0000,
Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> 
> When kernel is booted in EL2, SECVID registers are accessible to the
> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> on Zap firmware. Also, we can't load a secure firmware without a
> hypervisor that supports it.
> 
> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
> 
> 1. Gunyah (No KVM) - Loads zap shader based on DT
> 2. KVM in VHE - Skips zap shader load and programs SECVID register
> 3. KVM in nVHE - Loads zap shader based on DT
> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
> 	programs SECVID register
>
> For (1) and (3) configuration, this patch doesn't have any impact.
> Driver loads secure firmware based on other existing hints.

The moment the kernel is entered at EL2, this is a bare metal
situation, and everything is accessible. So your distinction between
VHE and nVHE (which would equally apply to hVHE and pKVM) makes no
sense at all. Same thing for KVM being disabled, which has no bearing
on what can be accessed.

> 
> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
> ---
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 82 +++++++++++++++++++++++------------
>  1 file changed, 54 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 019610341df1..9dcaa8472430 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -14,6 +14,10 @@
>  #include <linux/pm_domain.h>
>  #include <linux/soc/qcom/llcc-qcom.h>
>  
> +#ifdef CONFIG_ARM64
> +#include <asm/virt.h>
> +#endif

How about 32bit ARM?

> +
>  #define GPU_PAS_ID 13
>  
>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> @@ -998,6 +1002,54 @@ static int a6xx_zap_shader_init(struct msm_gpu *gpu)
>  	return ret;
>  }
>  
> +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> +{
> +	int ret;
> +
> +#ifdef CONFIG_ARM64
> +	/*
> +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> +	 * to switch the secure mode to avoid the dependency on zap shader.
> +	 */
> +	if (is_kernel_in_hyp_mode())
> +		goto direct_switch;

No, please. To check whether you are *booted* at EL2, you need to
check for is_hyp_available(). Whether the kernel runs at EL1 or EL2 is
none of the driver's business, really. This is still absolutely
disgusting from an abstraction perspective, but I guess we don't have
much choice here.

In the future, for anything involving KVM, please Cc the maintainers,
reviewers and mailing list listed in the MAINTAINERS file.

Thanks,

	M.

Connor Abbott Dec. 10, 2024, 10:52 p.m. UTC | #9

On Mon, Dec 9, 2024 at 3:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> When kernel is booted in EL2, SECVID registers are accessible to the
> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> on Zap firmware. Also, we can't load a secure firmware without a
> hypervisor that supports it.
>
> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
>
> 1. Gunyah (No KVM) - Loads zap shader based on DT
> 2. KVM in VHE - Skips zap shader load and programs SECVID register
> 3. KVM in nVHE - Loads zap shader based on DT
> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
>         programs SECVID register
>
> For (1) and (3) configuration, this patch doesn't have any impact.
> Driver loads secure firmware based on other existing hints.
>
> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>

For initializing CX_MISC_SW_FUSE_VALUE in a7xx_cx_mem_init(), we used
!qcom_scm_is_available() to assume that the register is writable
instead - can you just do that?

Connor

> ---
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 82 +++++++++++++++++++++++------------
>  1 file changed, 54 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 019610341df1..9dcaa8472430 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -14,6 +14,10 @@
>  #include <linux/pm_domain.h>
>  #include <linux/soc/qcom/llcc-qcom.h>
>
> +#ifdef CONFIG_ARM64
> +#include <asm/virt.h>
> +#endif
> +
>  #define GPU_PAS_ID 13
>
>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> @@ -998,6 +1002,54 @@ static int a6xx_zap_shader_init(struct msm_gpu *gpu)
>         return ret;
>  }
>
> +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> +{
> +       int ret;
> +
> +#ifdef CONFIG_ARM64
> +       /*
> +        * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> +        * to switch the secure mode to avoid the dependency on zap shader.
> +        */
> +       if (is_kernel_in_hyp_mode())
> +               goto direct_switch;
> +#endif
> +
> +       /*
> +        * Try to load a zap shader into the secure world. If successful
> +        * we can use the CP to switch out of secure mode. If not then we
> +        * have no resource but to try to switch ourselves out manually. If we
> +        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
> +        * be blocked and a permissions violation will soon follow.
> +        */
> +       ret = a6xx_zap_shader_init(gpu);
> +       if (ret == -ENODEV) {
> +               /*
> +                * This device does not use zap shader (but print a warning
> +                * just in case someone got their dt wrong.. hopefully they
> +                * have a debug UART to realize the error of their ways...
> +                * if you mess this up you are about to crash horribly)
> +                */
> +               dev_warn_once(gpu->dev->dev,
> +                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
> +               goto direct_switch;
> +       } else if (ret)
> +               return ret;
> +
> +       OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
> +       OUT_RING(gpu->rb[0], 0x00000000);
> +
> +       a6xx_flush(gpu, gpu->rb[0]);
> +       if (!a6xx_idle(gpu, gpu->rb[0]))
> +               return -EINVAL;
> +
> +       return 0;
> +
> +direct_switch:
> +       gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
> +       return 0;
> +}
> +
>  #define A6XX_INT_MASK (A6XX_RBBM_INT_0_MASK_CP_AHB_ERROR | \
>                        A6XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNCFIFO_OVERFLOW | \
>                        A6XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
> @@ -1341,35 +1393,9 @@ static int hw_init(struct msm_gpu *gpu)
>         if (ret)
>                 goto out;
>
> -       /*
> -        * Try to load a zap shader into the secure world. If successful
> -        * we can use the CP to switch out of secure mode. If not then we
> -        * have no resource but to try to switch ourselves out manually. If we
> -        * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
> -        * be blocked and a permissions violation will soon follow.
> -        */
> -       ret = a6xx_zap_shader_init(gpu);
> -       if (!ret) {
> -               OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
> -               OUT_RING(gpu->rb[0], 0x00000000);
> -
> -               a6xx_flush(gpu, gpu->rb[0]);
> -               if (!a6xx_idle(gpu, gpu->rb[0]))
> -                       return -EINVAL;
> -       } else if (ret == -ENODEV) {
> -               /*
> -                * This device does not use zap shader (but print a warning
> -                * just in case someone got their dt wrong.. hopefully they
> -                * have a debug UART to realize the error of their ways...
> -                * if you mess this up you are about to crash horribly)
> -                */
> -               dev_warn_once(gpu->dev->dev,
> -                       "Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
> -               gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
> -               ret = 0;
> -       } else {
> +       ret = a6xx_switch_secure_mode(gpu);
> +       if (!ret)
>                 return ret;
> -       }
>
>  out:
>         if (adreno_has_gmu_wrapper(adreno_gpu))
>
> ---
> base-commit: f4a867a46862c1743501bbe8c813238456ec8699
> change-id: 20241120-drm-msm-kvm-support-cd6e6744ced6
>
> Best regards,
> --
> Akhil P Oommen <quic_akhilpo@quicinc.com>
>

Pavan Kondeti Dec. 11, 2024, 12:37 a.m. UTC | #10

On Tue, Dec 10, 2024 at 09:24:03PM +0000, Marc Zyngier wrote:
> > +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> > +{
> > +	int ret;
> > +
> > +#ifdef CONFIG_ARM64
> > +	/*
> > +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> > +	 * to switch the secure mode to avoid the dependency on zap shader.
> > +	 */
> > +	if (is_kernel_in_hyp_mode())
> > +		goto direct_switch;
> 
> No, please. To check whether you are *booted* at EL2, you need to
> check for is_hyp_available(). Whether the kernel runs at EL1 or EL2 is
> none of the driver's business, really. This is still absolutely
> disgusting from an abstraction perspective, but I guess we don't have
> much choice here.
> 

Thanks Marc. Any suggestions on how we can make is_hyp_mode_available()
available for modules? Do you prefer exporting
kvm_protected_mode_initialized and __boot_cpu_mode symbols directly or
try something like [1]?

[1]
https://lore.kernel.org/lkml/1443649252-10702-1-git-send-email-ralf@ramses-pyramidenbau.de/#t

Thanks,
Pavan

Pavan Kondeti Dec. 11, 2024, 12:40 a.m. UTC | #11

On Tue, Dec 10, 2024 at 05:52:27PM -0500, Connor Abbott wrote:
> On Mon, Dec 9, 2024 at 3:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> >
> > When kernel is booted in EL2, SECVID registers are accessible to the
> > KMD. So we can use that to switch GPU's secure mode to avoid dependency
> > on Zap firmware. Also, we can't load a secure firmware without a
> > hypervisor that supports it.
> >
> > Tested following configurations on sa8775p chipset (Adreno 663 gpu):
> >
> > 1. Gunyah (No KVM) - Loads zap shader based on DT
> > 2. KVM in VHE - Skips zap shader load and programs SECVID register
> > 3. KVM in nVHE - Loads zap shader based on DT
> > 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
> >         programs SECVID register
> >
> > For (1) and (3) configuration, this patch doesn't have any impact.
> > Driver loads secure firmware based on other existing hints.
> >
> > Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
> 
> For initializing CX_MISC_SW_FUSE_VALUE in a7xx_cx_mem_init(), we used
> !qcom_scm_is_available() to assume that the register is writable
> instead - can you just do that?
> 

qcom_scm_is_avaialble() returns true as most of the QC TZ firmware API
works w/ kernel running as bare metal i.e booted with EL2. Any services
that assume QC firmware @ EL2 (QHEE) is absent.

Thanks,
Pavan

Bjorn Andersson Dec. 11, 2024, 1:13 a.m. UTC | #12

On Tue, Dec 10, 2024 at 02:22:27AM +0530, Akhil P Oommen wrote:
> On 12/10/2024 1:24 AM, Rob Clark wrote:
> > On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> >>
> >> When kernel is booted in EL2, SECVID registers are accessible to the
> >> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> >> on Zap firmware. Also, we can't load a secure firmware without a
> >> hypervisor that supports it.
> > 
> > Shouldn't we do this based on whether zap node is in dtb (and not disabled)?
> 
> This is better, isn't it? Otherwise, multiple overlays should be
> maintained for each soc/board since EL2 can be toggled from bootloader.
> And this feature is likely going to be more widely available.
> 

The DeviceTree passed to the OS needs to describe the world that said OS
is going to operate in. If you change the world you need to change the
description.
There are several other examples where this would be necessary
(remoteproc and watchdog to name two examples from the Qualcomm upstream
world).

So, if we can cover this by zap-shader being enabled or disabled, that
sounds like a clean and scaleable solution.

Regards,
Bjorn

Akhil P Oommen Dec. 11, 2024, 3:08 a.m. UTC | #13

On 12/11/2024 6:43 AM, Bjorn Andersson wrote:
> On Tue, Dec 10, 2024 at 02:22:27AM +0530, Akhil P Oommen wrote:
>> On 12/10/2024 1:24 AM, Rob Clark wrote:
>>> On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>>>>
>>>> When kernel is booted in EL2, SECVID registers are accessible to the
>>>> KMD. So we can use that to switch GPU's secure mode to avoid dependency
>>>> on Zap firmware. Also, we can't load a secure firmware without a
>>>> hypervisor that supports it.
>>>
>>> Shouldn't we do this based on whether zap node is in dtb (and not disabled)?
>>
>> This is better, isn't it? Otherwise, multiple overlays should be
>> maintained for each soc/board since EL2 can be toggled from bootloader.
>> And this feature is likely going to be more widely available.
>>
> 
> The DeviceTree passed to the OS needs to describe the world that said OS
> is going to operate in. If you change the world you need to change the
> description.
> There are several other examples where this would be necessary
> (remoteproc and watchdog to name two examples from the Qualcomm upstream
> world).

But basic things work without those changes, right? For eg: Desktop UI

> 
> So, if we can cover this by zap-shader being enabled or disabled, that
> sounds like a clean and scaleable solution.

I think we are focusing too much on zap shader. If the driver can
determine itself about access to its register, shouldn't it be allowed
to use that?

-Akhil

> 
> Regards,
> Bjorn

Akhil P Oommen Dec. 11, 2024, 3:09 a.m. UTC | #14

On 12/11/2024 2:24 AM, Elliot Berman wrote:
> On Mon, Dec 09, 2024 at 01:49:15PM +0530, Akhil P Oommen wrote:
>> When kernel is booted in EL2, SECVID registers are accessible to the
>> KMD. So we can use that to switch GPU's secure mode to avoid dependency
>> on Zap firmware. Also, we can't load a secure firmware without a
>> hypervisor that supports it.
>>
>> Tested following configurations on sa8775p chipset (Adreno 663 gpu):
>>
>> 1. Gunyah (No KVM) - Loads zap shader based on DT
>> 2. KVM in VHE - Skips zap shader load and programs SECVID register
>> 3. KVM in nVHE - Loads zap shader based on DT
> 
> I think this might be misleading. As I understand, KVM in nVHE doesn't
> support loading secure firmware. I'm not aware of any support added to
> make it work. So, the driver will try to load zap shader and it fails
> same as it does today.
>

I see that now. I was trying to document the decision logic in each case.

-Akhil.

>> 4. Kernel in EL2 with CONFIG_KVM=n - Skips zap shader load and
>> 	programs SECVID register
>>
>> For (1) and (3) configuration, this patch doesn't have any impact.
>> Driver loads secure firmware based on other existing hints.
>>
>> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
>> ---
>> ---
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 82 +++++++++++++++++++++++------------
>>  1 file changed, 54 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index 019610341df1..9dcaa8472430 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -14,6 +14,10 @@
>>  #include <linux/pm_domain.h>
>>  #include <linux/soc/qcom/llcc-qcom.h>
>>  
>> +#ifdef CONFIG_ARM64
>> +#include <asm/virt.h>
>> +#endif
>> +
>>  #define GPU_PAS_ID 13
>>  
>>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>> @@ -998,6 +1002,54 @@ static int a6xx_zap_shader_init(struct msm_gpu *gpu)
>>  	return ret;
>>  }
>>  
>> +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
>> +{
>> +	int ret;
>> +
>> +#ifdef CONFIG_ARM64
>> +	/*
>> +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
>> +	 * to switch the secure mode to avoid the dependency on zap shader.
>> +	 */
>> +	if (is_kernel_in_hyp_mode())
>> +		goto direct_switch;
>> +#endif
>> +
>> +	/*
>> +	 * Try to load a zap shader into the secure world. If successful
>> +	 * we can use the CP to switch out of secure mode. If not then we
>> +	 * have no resource but to try to switch ourselves out manually. If we
>> +	 * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
>> +	 * be blocked and a permissions violation will soon follow.
>> +	 */
>> +	ret = a6xx_zap_shader_init(gpu);
>> +	if (ret == -ENODEV) {
>> +		/*
>> +		 * This device does not use zap shader (but print a warning
>> +		 * just in case someone got their dt wrong.. hopefully they
>> +		 * have a debug UART to realize the error of their ways...
>> +		 * if you mess this up you are about to crash horribly)
>> +		 */
>> +		dev_warn_once(gpu->dev->dev,
>> +			"Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
>> +		goto direct_switch;
>> +	} else if (ret)
>> +		return ret;
>> +
>> +	OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
>> +	OUT_RING(gpu->rb[0], 0x00000000);
>> +
>> +	a6xx_flush(gpu, gpu->rb[0]);
>> +	if (!a6xx_idle(gpu, gpu->rb[0]))
>> +		return -EINVAL;
>> +
>> +	return 0;
>> +
>> +direct_switch:
>> +	gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
>> +	return 0;
>> +}
>> +
>>  #define A6XX_INT_MASK (A6XX_RBBM_INT_0_MASK_CP_AHB_ERROR | \
>>  		       A6XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNCFIFO_OVERFLOW | \
>>  		       A6XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
>> @@ -1341,35 +1393,9 @@ static int hw_init(struct msm_gpu *gpu)
>>  	if (ret)
>>  		goto out;
>>  
>> -	/*
>> -	 * Try to load a zap shader into the secure world. If successful
>> -	 * we can use the CP to switch out of secure mode. If not then we
>> -	 * have no resource but to try to switch ourselves out manually. If we
>> -	 * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
>> -	 * be blocked and a permissions violation will soon follow.
>> -	 */
>> -	ret = a6xx_zap_shader_init(gpu);
>> -	if (!ret) {
>> -		OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
>> -		OUT_RING(gpu->rb[0], 0x00000000);
>> -
>> -		a6xx_flush(gpu, gpu->rb[0]);
>> -		if (!a6xx_idle(gpu, gpu->rb[0]))
>> -			return -EINVAL;
>> -	} else if (ret == -ENODEV) {
>> -		/*
>> -		 * This device does not use zap shader (but print a warning
>> -		 * just in case someone got their dt wrong.. hopefully they
>> -		 * have a debug UART to realize the error of their ways...
>> -		 * if you mess this up you are about to crash horribly)
>> -		 */
>> -		dev_warn_once(gpu->dev->dev,
>> -			"Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
>> -		gpu_write(gpu, REG_A6XX_RBBM_SECVID_TRUST_CNTL, 0x0);
>> -		ret = 0;
>> -	} else {
>> +	ret = a6xx_switch_secure_mode(gpu);
>> +	if (!ret)
>>  		return ret;
>> -	}
>>  
>>  out:
>>  	if (adreno_has_gmu_wrapper(adreno_gpu))
>>
>> ---
>> base-commit: f4a867a46862c1743501bbe8c813238456ec8699
>> change-id: 20241120-drm-msm-kvm-support-cd6e6744ced6
>>
>> Best regards,
>> -- 
>> Akhil P Oommen <quic_akhilpo@quicinc.com>
>>

Rob Clark Dec. 11, 2024, 3:43 a.m. UTC | #15

On Tue, Dec 10, 2024 at 7:08 PM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On 12/11/2024 6:43 AM, Bjorn Andersson wrote:
> > On Tue, Dec 10, 2024 at 02:22:27AM +0530, Akhil P Oommen wrote:
> >> On 12/10/2024 1:24 AM, Rob Clark wrote:
> >>> On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> >>>>
> >>>> When kernel is booted in EL2, SECVID registers are accessible to the
> >>>> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> >>>> on Zap firmware. Also, we can't load a secure firmware without a
> >>>> hypervisor that supports it.
> >>>
> >>> Shouldn't we do this based on whether zap node is in dtb (and not disabled)?
> >>
> >> This is better, isn't it? Otherwise, multiple overlays should be
> >> maintained for each soc/board since EL2 can be toggled from bootloader.
> >> And this feature is likely going to be more widely available.
> >>
> >
> > The DeviceTree passed to the OS needs to describe the world that said OS
> > is going to operate in. If you change the world you need to change the
> > description.
> > There are several other examples where this would be necessary
> > (remoteproc and watchdog to name two examples from the Qualcomm upstream
> > world).
>
> But basic things work without those changes, right? For eg: Desktop UI

It isn't really so much about whether certain use-cases can work with
a sub-optimal description of the hw (where in this case "hw" really
means "hw plus how the fw allows things to look to the HLOS").. It is
more about the hw/fw/whatever providing an accurate description of
what things look like to the HLOS.

I'm leaning more towards the hw+fw providing HLOS an accurate view...
and the fact that that carries over into other areas of dtb (ie. it
isn't the only thing that slbounce needs to patch, as I previously
mentioned) reinforces my view there.  This seems like a thing to fix
in fw/bootloader tbh.

BR,
-R


>
> >
> > So, if we can cover this by zap-shader being enabled or disabled, that
> > sounds like a clean and scaleable solution.
>
> I think we are focusing too much on zap shader. If the driver can
> determine itself about access to its register, shouldn't it be allowed
> to use that?
>
> -Akhil
>
> >
> > Regards,
> > Bjorn
>

Pavan Kondeti Dec. 11, 2024, 7:36 a.m. UTC | #16

+devicetree

On Tue, Dec 10, 2024 at 07:43:19PM -0800, Rob Clark wrote:
> On Tue, Dec 10, 2024 at 7:08 PM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> >
> > On 12/11/2024 6:43 AM, Bjorn Andersson wrote:
> > > On Tue, Dec 10, 2024 at 02:22:27AM +0530, Akhil P Oommen wrote:
> > >> On 12/10/2024 1:24 AM, Rob Clark wrote:
> > >>> On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> > >>>>
> > >>>> When kernel is booted in EL2, SECVID registers are accessible to the
> > >>>> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> > >>>> on Zap firmware. Also, we can't load a secure firmware without a
> > >>>> hypervisor that supports it.
> > >>>
> > >>> Shouldn't we do this based on whether zap node is in dtb (and not disabled)?
> > >>
> > >> This is better, isn't it? Otherwise, multiple overlays should be
> > >> maintained for each soc/board since EL2 can be toggled from bootloader.
> > >> And this feature is likely going to be more widely available.
> > >>
> > >
> > > The DeviceTree passed to the OS needs to describe the world that said OS
> > > is going to operate in. If you change the world you need to change the
> > > description.
> > > There are several other examples where this would be necessary
> > > (remoteproc and watchdog to name two examples from the Qualcomm upstream
> > > world).
> >
> > But basic things work without those changes, right? For eg: Desktop UI
> 
> It isn't really so much about whether certain use-cases can work with
> a sub-optimal description of the hw (where in this case "hw" really
> means "hw plus how the fw allows things to look to the HLOS").. It is
> more about the hw/fw/whatever providing an accurate description of
> what things look like to the HLOS.
> 
> I'm leaning more towards the hw+fw providing HLOS an accurate view...
> and the fact that that carries over into other areas of dtb (ie. it
> isn't the only thing that slbounce needs to patch, as I previously
> mentioned) reinforces my view there.  This seems like a thing to fix
> in fw/bootloader tbh.
> 

Thanks Rob and Bjorn for your inputs. At the moment, we don't have
capability in our bootloader to apply a *specific* overlay when Linux
kernel is starteed in EL, this is making GPU non-functional. This patch
from Akhil fixes the problem without depending on the bootloader.

From this discussion, I understand that it is recommended to provide
HW+FW view in dT correctly instead of doing runtime checks in the
kernel. We can take this as a requirement to the bootloader.

I would like to check how we should proceed with overlay. Should we
submit dtso upstream and let bootloader apply the overlay at runtime or
this whole overlay needs to be maintained in the bootloader. Also,
Should we build all board dtb for EL2 as well or just leave it at compiling
the EL2 dtbo (one per SoC)?

Thanks,
Pavan

Dmitry Baryshkov Dec. 11, 2024, 8:52 a.m. UTC | #17

On Wed, Dec 11, 2024 at 01:06:58PM +0530, Pavan Kondeti wrote:
> +devicetree
> 
> On Tue, Dec 10, 2024 at 07:43:19PM -0800, Rob Clark wrote:
> > On Tue, Dec 10, 2024 at 7:08 PM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> > >
> > > On 12/11/2024 6:43 AM, Bjorn Andersson wrote:
> > > > On Tue, Dec 10, 2024 at 02:22:27AM +0530, Akhil P Oommen wrote:
> > > >> On 12/10/2024 1:24 AM, Rob Clark wrote:
> > > >>> On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> > > >>>>
> > > >>>> When kernel is booted in EL2, SECVID registers are accessible to the
> > > >>>> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> > > >>>> on Zap firmware. Also, we can't load a secure firmware without a
> > > >>>> hypervisor that supports it.
> > > >>>
> > > >>> Shouldn't we do this based on whether zap node is in dtb (and not disabled)?
> > > >>
> > > >> This is better, isn't it? Otherwise, multiple overlays should be
> > > >> maintained for each soc/board since EL2 can be toggled from bootloader.
> > > >> And this feature is likely going to be more widely available.
> > > >>
> > > >
> > > > The DeviceTree passed to the OS needs to describe the world that said OS
> > > > is going to operate in. If you change the world you need to change the
> > > > description.
> > > > There are several other examples where this would be necessary
> > > > (remoteproc and watchdog to name two examples from the Qualcomm upstream
> > > > world).
> > >
> > > But basic things work without those changes, right? For eg: Desktop UI
> > 
> > It isn't really so much about whether certain use-cases can work with
> > a sub-optimal description of the hw (where in this case "hw" really
> > means "hw plus how the fw allows things to look to the HLOS").. It is
> > more about the hw/fw/whatever providing an accurate description of
> > what things look like to the HLOS.
> > 
> > I'm leaning more towards the hw+fw providing HLOS an accurate view...
> > and the fact that that carries over into other areas of dtb (ie. it
> > isn't the only thing that slbounce needs to patch, as I previously
> > mentioned) reinforces my view there.  This seems like a thing to fix
> > in fw/bootloader tbh.
> > 
> 
> Thanks Rob and Bjorn for your inputs. At the moment, we don't have
> capability in our bootloader to apply a *specific* overlay when Linux
> kernel is starteed in EL, this is making GPU non-functional. This patch
> from Akhil fixes the problem without depending on the bootloader.
> 
> From this discussion, I understand that it is recommended to provide
> HW+FW view in dT correctly instead of doing runtime checks in the
> kernel. We can take this as a requirement to the bootloader.
> 
> I would like to check how we should proceed with overlay. Should we
> submit dtso upstream and let bootloader apply the overlay at runtime or
> this whole overlay needs to be maintained in the bootloader. Also,
> Should we build all board dtb for EL2 as well or just leave it at compiling
> the EL2 dtbo (one per SoC)?

It doesn't have to be a dtbo. Instead you might just patch the DT
(Ideally via the https://github.com/U-Boot-EFI/EFI_DT_FIXUP_PROTOCOL).
I think the bootloader already changes the DT (by fixing memory sizes,
etc), so enabling or disabling ZAP & fixing several other bits and
pieces sounds logical.

Pavan Kondeti Dec. 11, 2024, 8:59 a.m. UTC | #18

On Wed, Dec 11, 2024 at 10:52:51AM +0200, Dmitry Baryshkov wrote:
> On Wed, Dec 11, 2024 at 01:06:58PM +0530, Pavan Kondeti wrote:
> > +devicetree
> > 
> > On Tue, Dec 10, 2024 at 07:43:19PM -0800, Rob Clark wrote:
> > > On Tue, Dec 10, 2024 at 7:08 PM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> > > >
> > > > On 12/11/2024 6:43 AM, Bjorn Andersson wrote:
> > > > > On Tue, Dec 10, 2024 at 02:22:27AM +0530, Akhil P Oommen wrote:
> > > > >> On 12/10/2024 1:24 AM, Rob Clark wrote:
> > > > >>> On Mon, Dec 9, 2024 at 12:20 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
> > > > >>>>
> > > > >>>> When kernel is booted in EL2, SECVID registers are accessible to the
> > > > >>>> KMD. So we can use that to switch GPU's secure mode to avoid dependency
> > > > >>>> on Zap firmware. Also, we can't load a secure firmware without a
> > > > >>>> hypervisor that supports it.
> > > > >>>
> > > > >>> Shouldn't we do this based on whether zap node is in dtb (and not disabled)?
> > > > >>
> > > > >> This is better, isn't it? Otherwise, multiple overlays should be
> > > > >> maintained for each soc/board since EL2 can be toggled from bootloader.
> > > > >> And this feature is likely going to be more widely available.
> > > > >>
> > > > >
> > > > > The DeviceTree passed to the OS needs to describe the world that said OS
> > > > > is going to operate in. If you change the world you need to change the
> > > > > description.
> > > > > There are several other examples where this would be necessary
> > > > > (remoteproc and watchdog to name two examples from the Qualcomm upstream
> > > > > world).
> > > >
> > > > But basic things work without those changes, right? For eg: Desktop UI
> > > 
> > > It isn't really so much about whether certain use-cases can work with
> > > a sub-optimal description of the hw (where in this case "hw" really
> > > means "hw plus how the fw allows things to look to the HLOS").. It is
> > > more about the hw/fw/whatever providing an accurate description of
> > > what things look like to the HLOS.
> > > 
> > > I'm leaning more towards the hw+fw providing HLOS an accurate view...
> > > and the fact that that carries over into other areas of dtb (ie. it
> > > isn't the only thing that slbounce needs to patch, as I previously
> > > mentioned) reinforces my view there.  This seems like a thing to fix
> > > in fw/bootloader tbh.
> > > 
> > 
> > Thanks Rob and Bjorn for your inputs. At the moment, we don't have
> > capability in our bootloader to apply a *specific* overlay when Linux
> > kernel is starteed in EL, this is making GPU non-functional. This patch
> > from Akhil fixes the problem without depending on the bootloader.
> > 
> > From this discussion, I understand that it is recommended to provide
> > HW+FW view in dT correctly instead of doing runtime checks in the
> > kernel. We can take this as a requirement to the bootloader.
> > 
> > I would like to check how we should proceed with overlay. Should we
> > submit dtso upstream and let bootloader apply the overlay at runtime or
> > this whole overlay needs to be maintained in the bootloader. Also,
> > Should we build all board dtb for EL2 as well or just leave it at compiling
> > the EL2 dtbo (one per SoC)?
> 
> It doesn't have to be a dtbo. Instead you might just patch the DT
> (Ideally via the https://github.com/U-Boot-EFI/EFI_DT_FIXUP_PROTOCOL).
> I think the bootloader already changes the DT (by fixing memory sizes,
> etc), so enabling or disabling ZAP & fixing several other bits and
> pieces sounds logical.
> 

Yes, that would mean maintaining this completely outside upstream
kernel. As you said, this is currently happening for bootargs, memory
etc.

Thanks,
Pavan

Marc Zyngier Dec. 11, 2024, 10:40 a.m. UTC | #19

On Wed, 11 Dec 2024 00:37:34 +0000,
Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
> 
> On Tue, Dec 10, 2024 at 09:24:03PM +0000, Marc Zyngier wrote:
> > > +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> > > +{
> > > +	int ret;
> > > +
> > > +#ifdef CONFIG_ARM64
> > > +	/*
> > > +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> > > +	 * to switch the secure mode to avoid the dependency on zap shader.
> > > +	 */
> > > +	if (is_kernel_in_hyp_mode())
> > > +		goto direct_switch;
> > 
> > No, please. To check whether you are *booted* at EL2, you need to
> > check for is_hyp_available(). Whether the kernel runs at EL1 or EL2 is
> > none of the driver's business, really. This is still absolutely
> > disgusting from an abstraction perspective, but I guess we don't have
> > much choice here.
> > 
> 
> Thanks Marc. Any suggestions on how we can make is_hyp_mode_available()
> available for modules? Do you prefer exporting
> kvm_protected_mode_initialized and __boot_cpu_mode symbols directly or
> try something like [1]?

Ideally, neither. These were bad ideas nine years ago, and they still
are. The least ugly hack I can come up with is the patch below, and
you'd write something like:

	if (cpus_have_cap(ARM64_HAS_EL2_OWNERSHIP))
		blah();

This is obviously completely untested.

It also doesn't solve the problem of the kernel booted on bare-metal
at EL1, or with a hypervisor that doesn't change the programming
interface of the device under the guest's feet. Eventually, someone
will have to address these cases.

Thanks,

	M.

From 4823e7bb868d3ac2b938ecc4c3dbbdd460656af1 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz@kernel.org>
Date: Wed, 11 Dec 2024 10:02:25 +0000
Subject: [PATCH] arm64: Expose kernel ownership of EL2 via a capability

It appears that some drivers have to jump through a lot of hoops
to initialise correctly when running under a particular hypervisor,
while they can directly do it when running bare-metal.

Unfortunately, said hypervisor cannot be directly identified as it
doesn't implement the correct SMCCC interface, leaving the driver
with a certain amount of guesswork.

Being booted at EL2 provides at least an indication that there is no
non-nesting hypervisor, which is good enough to discriminate the
humpy hypervisor.

For this purpose, expose a new system-wide CPU capability aptly named
ARM64_HAS_EL2_OWNERSHIP, which said driver can check.

Note that this doesn't solve the problem of a kernel booted at EL1
without a hypervisor, or with a hypervisor that doesn't break the
device programming interface.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 11 +++++++++++
 arch/arm64/tools/cpucaps       |  1 +
 2 files changed, 12 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 36c7b29ddf9e8..8fdc3ef23d9dc 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1868,6 +1868,11 @@ static bool has_nv1(const struct arm64_cpu_capabilities *entry, int scope)
 		  is_midr_in_range_list(read_cpuid_id(), nv1_ni_list)));
 }

+static bool has_el2_ownership(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	return is_hyp_mode_available();
+}
+
 #if defined(ID_AA64MMFR0_EL1_TGRAN_LPA2) && defined(ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2)
 static bool has_lpa2_at_stage1(u64 mmfr0)
 {
@@ -3012,6 +3017,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		ARM64_CPUID_FIELDS(ID_AA64PFR1_EL1, GCS, IMP)
 	},
 #endif
+	{
+		.desc = "Kernel owns EL2",
+		.capability = ARM64_HAS_EL2_OWNERSHIP,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_el2_ownership,
+	},
 	{},
 };

diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 1e65f2fb45bd1..94ce3462e6298 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -24,6 +24,7 @@ HAS_DIT
 HAS_E0PD
 HAS_ECV
 HAS_ECV_CNTPOFF
+HAS_EL2_OWNERSHIP
 HAS_EPAN
 HAS_EVT
 HAS_FPMR

Pavan Kondeti Dec. 12, 2024, 5:31 a.m. UTC | #20

On Wed, Dec 11, 2024 at 10:40:02AM +0000, Marc Zyngier wrote:
> On Wed, 11 Dec 2024 00:37:34 +0000,
> Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
> > 
> > On Tue, Dec 10, 2024 at 09:24:03PM +0000, Marc Zyngier wrote:
> > > > +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +#ifdef CONFIG_ARM64
> > > > +	/*
> > > > +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> > > > +	 * to switch the secure mode to avoid the dependency on zap shader.
> > > > +	 */
> > > > +	if (is_kernel_in_hyp_mode())
> > > > +		goto direct_switch;
> > > 
> > > No, please. To check whether you are *booted* at EL2, you need to
> > > check for is_hyp_available(). Whether the kernel runs at EL1 or EL2 is
> > > none of the driver's business, really. This is still absolutely
> > > disgusting from an abstraction perspective, but I guess we don't have
> > > much choice here.
> > > 
> > 
> > Thanks Marc. Any suggestions on how we can make is_hyp_mode_available()
> > available for modules? Do you prefer exporting
> > kvm_protected_mode_initialized and __boot_cpu_mode symbols directly or
> > try something like [1]?
> 
> Ideally, neither. These were bad ideas nine years ago, and they still
> are. The least ugly hack I can come up with is the patch below, and
> you'd write something like:
> 
> 	if (cpus_have_cap(ARM64_HAS_EL2_OWNERSHIP))
> 		blah();
> 
> This is obviously completely untested.
> 

I have tested your patch. It works as intended. Thanks Marc.

> It also doesn't solve the problem of the kernel booted on bare-metal
> at EL1, or with a hypervisor that doesn't change the programming
> interface of the device under the guest's feet. Eventually, someone
> will have to address these cases.
> 

Noted, Thanks.

~Pavan

Marc Zyngier Dec. 12, 2024, 8:50 a.m. UTC | #21

On Thu, 12 Dec 2024 05:31:00 +0000,
Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
> 
> On Wed, Dec 11, 2024 at 10:40:02AM +0000, Marc Zyngier wrote:
> > On Wed, 11 Dec 2024 00:37:34 +0000,
> > Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
> > > 
> > > On Tue, Dec 10, 2024 at 09:24:03PM +0000, Marc Zyngier wrote:
> > > > > +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> > > > > +{
> > > > > +	int ret;
> > > > > +
> > > > > +#ifdef CONFIG_ARM64
> > > > > +	/*
> > > > > +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> > > > > +	 * to switch the secure mode to avoid the dependency on zap shader.
> > > > > +	 */
> > > > > +	if (is_kernel_in_hyp_mode())
> > > > > +		goto direct_switch;
> > > > 
> > > > No, please. To check whether you are *booted* at EL2, you need to
> > > > check for is_hyp_available(). Whether the kernel runs at EL1 or EL2 is
> > > > none of the driver's business, really. This is still absolutely
> > > > disgusting from an abstraction perspective, but I guess we don't have
> > > > much choice here.
> > > > 
> > > 
> > > Thanks Marc. Any suggestions on how we can make is_hyp_mode_available()
> > > available for modules? Do you prefer exporting
> > > kvm_protected_mode_initialized and __boot_cpu_mode symbols directly or
> > > try something like [1]?
> > 
> > Ideally, neither. These were bad ideas nine years ago, and they still
> > are. The least ugly hack I can come up with is the patch below, and
> > you'd write something like:
> > 
> > 	if (cpus_have_cap(ARM64_HAS_EL2_OWNERSHIP))
> > 		blah();
> > 
> > This is obviously completely untested.
> > 
> 
> I have tested your patch. It works as intended. Thanks Marc.

Note that you will probably get some push-back from the arm64
maintainers on this front, because this is a fairly incomplete (and
fragile) solution.

It would be much better if the discriminant came from the device tree.
After all, the hypervisor is fscking-up^W^Wchanging the programming
model of the GPU, and that should be reflected in the DT. Because for
all intent and purposes, this is not the same hardware anymore.

The GPU isn't the only device that needs fixing in that way: the
SMMUv3 needs to be exposed to the OS, and the PCIe ports need to be
linked to it and the ITS. So at the end of the day, detecting EL2 only
serves a limited purpose. You need to handle these cases, and might as
well put the GPU in the same bag.

Which means that you'd either have a pair of static DTs (one that
exposes the brokenness of the firmware, and one that doesn't), or you
go the dtbhack route to compose the DT at boot time.

Thanks,

	M.

Mark Rutland Dec. 12, 2024, 10:40 a.m. UTC | #22

On Thu, Dec 12, 2024 at 08:50:12AM +0000, Marc Zyngier wrote:
> On Thu, 12 Dec 2024 05:31:00 +0000,
> Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
> > 
> > On Wed, Dec 11, 2024 at 10:40:02AM +0000, Marc Zyngier wrote:
> > > On Wed, 11 Dec 2024 00:37:34 +0000,
> > > Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
> > > > 
> > > > On Tue, Dec 10, 2024 at 09:24:03PM +0000, Marc Zyngier wrote:
> > > > > > +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> > > > > > +{
> > > > > > +	int ret;
> > > > > > +
> > > > > > +#ifdef CONFIG_ARM64
> > > > > > +	/*
> > > > > > +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> > > > > > +	 * to switch the secure mode to avoid the dependency on zap shader.
> > > > > > +	 */
> > > > > > +	if (is_kernel_in_hyp_mode())
> > > > > > +		goto direct_switch;
> > > > > 
> > > > > No, please. To check whether you are *booted* at EL2, you need to
> > > > > check for is_hyp_available(). Whether the kernel runs at EL1 or EL2 is
> > > > > none of the driver's business, really. This is still absolutely
> > > > > disgusting from an abstraction perspective, but I guess we don't have
> > > > > much choice here.
> > > > > 
> > > > 
> > > > Thanks Marc. Any suggestions on how we can make is_hyp_mode_available()
> > > > available for modules? Do you prefer exporting
> > > > kvm_protected_mode_initialized and __boot_cpu_mode symbols directly or
> > > > try something like [1]?
> > > 
> > > Ideally, neither. These were bad ideas nine years ago, and they still
> > > are. The least ugly hack I can come up with is the patch below, and
> > > you'd write something like:
> > > 
> > > 	if (cpus_have_cap(ARM64_HAS_EL2_OWNERSHIP))
> > > 		blah();
> > > 
> > > This is obviously completely untested.
> > > 
> > 
> > I have tested your patch. It works as intended. Thanks Marc.
> 
> Note that you will probably get some push-back from the arm64
> maintainers on this front, because this is a fairly incomplete (and
> fragile) solution.
> 
> It would be much better if the discriminant came from the device tree.
> After all, the hypervisor is fscking-up^W^Wchanging the programming
> model of the GPU, and that should be reflected in the DT. Because for
> all intent and purposes, this is not the same hardware anymore.

FWIW I agree 100%, this should be described in DT.

The cpucap doesn't describe the actual property we care about, and it
cannot in general (e.g. for nested virt). I would strongly prefer to not
have that as it's setting ourselves up for failure.

> The GPU isn't the only device that needs fixing in that way: the
> SMMUv3 needs to be exposed to the OS, and the PCIe ports need to be
> linked to it and the ITS. So at the end of the day, detecting EL2 only
> serves a limited purpose. You need to handle these cases, and might as
> well put the GPU in the same bag.
> 
> Which means that you'd either have a pair of static DTs (one that
> exposes the brokenness of the firmware, and one that doesn't), or you
> go the dtbhack route to compose the DT at boot time.

Liekwise, agreed on all of this.

Mark.

Pavan Kondeti Dec. 13, 2024, 3:05 a.m. UTC | #23

On Thu, Dec 12, 2024 at 10:40:46AM +0000, Mark Rutland wrote:
> On Thu, Dec 12, 2024 at 08:50:12AM +0000, Marc Zyngier wrote:
> > On Thu, 12 Dec 2024 05:31:00 +0000,
> > Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
> > > 
> > > On Wed, Dec 11, 2024 at 10:40:02AM +0000, Marc Zyngier wrote:
> > > > On Wed, 11 Dec 2024 00:37:34 +0000,
> > > > Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
> > > > > 
> > > > > On Tue, Dec 10, 2024 at 09:24:03PM +0000, Marc Zyngier wrote:
> > > > > > > +static int a6xx_switch_secure_mode(struct msm_gpu *gpu)
> > > > > > > +{
> > > > > > > +	int ret;
> > > > > > > +
> > > > > > > +#ifdef CONFIG_ARM64
> > > > > > > +	/*
> > > > > > > +	 * We can access SECVID_TRUST_CNTL register when kernel is booted in EL2 mode. So, use it
> > > > > > > +	 * to switch the secure mode to avoid the dependency on zap shader.
> > > > > > > +	 */
> > > > > > > +	if (is_kernel_in_hyp_mode())
> > > > > > > +		goto direct_switch;
> > > > > > 
> > > > > > No, please. To check whether you are *booted* at EL2, you need to
> > > > > > check for is_hyp_available(). Whether the kernel runs at EL1 or EL2 is
> > > > > > none of the driver's business, really. This is still absolutely
> > > > > > disgusting from an abstraction perspective, but I guess we don't have
> > > > > > much choice here.
> > > > > > 
> > > > > 
> > > > > Thanks Marc. Any suggestions on how we can make is_hyp_mode_available()
> > > > > available for modules? Do you prefer exporting
> > > > > kvm_protected_mode_initialized and __boot_cpu_mode symbols directly or
> > > > > try something like [1]?
> > > > 
> > > > Ideally, neither. These were bad ideas nine years ago, and they still
> > > > are. The least ugly hack I can come up with is the patch below, and
> > > > you'd write something like:
> > > > 
> > > > 	if (cpus_have_cap(ARM64_HAS_EL2_OWNERSHIP))
> > > > 		blah();
> > > > 
> > > > This is obviously completely untested.
> > > > 
> > > 
> > > I have tested your patch. It works as intended. Thanks Marc.
> > 
> > Note that you will probably get some push-back from the arm64
> > maintainers on this front, because this is a fairly incomplete (and
> > fragile) solution.
> > 
> > It would be much better if the discriminant came from the device tree.
> > After all, the hypervisor is fscking-up^W^Wchanging the programming
> > model of the GPU, and that should be reflected in the DT. Because for
> > all intent and purposes, this is not the same hardware anymore.
> 
> FWIW I agree 100%, this should be described in DT.
> 
> The cpucap doesn't describe the actual property we care about, and it
> cannot in general (e.g. for nested virt). I would strongly prefer to not
> have that as it's setting ourselves up for failure.
> 
Thanks for the feedback. I understand that EL2 detection in kernel is
not going to cover cases like bare metal EL1, nested virtualization. We
plan to take the DT approach. 

Thanks,
Pavan

drm/msm/a6xx: Skip gpu secure fw load in EL2 mode

Commit Message

Comments

Patch