diff mbox series

drm/i915/guc: Initialize GuC submission locks and queues early

Message ID 20220215011123.734572-1-daniele.ceraolospurio@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/guc: Initialize GuC submission locks and queues early | expand

Commit Message

Daniele Ceraolo Spurio Feb. 15, 2022, 1:11 a.m. UTC
Move initialization of submission-related spinlock, lists and workers to
init_early. This fixes an issue where if the GuC init fails we might
still try to get the lock in the context cleanup code. Note that it is
safe to call the GuC context cleanup code even if the init failed
because all contexts are initialized with an invalid GuC ID, which will
cause the GuC side of the cleanup to be skipped, so it is easier to just
make sure the variables are initialized than to special case the cleanup
to handle the case when they're not.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++++++++++---------
 1 file changed, 14 insertions(+), 13 deletions(-)

Comments

Tvrtko Ursulin Feb. 15, 2022, 9:09 a.m. UTC | #1
On 15/02/2022 01:11, Daniele Ceraolo Spurio wrote:
> Move initialization of submission-related spinlock, lists and workers to
> init_early. This fixes an issue where if the GuC init fails we might
> still try to get the lock in the context cleanup code. Note that it is

What's the worst case impact on non-debug builds aka is Fixes: required?

Regards,

Tvrtko

> safe to call the GuC context cleanup code even if the init failed
> because all contexts are initialized with an invalid GuC ID, which will
> cause the GuC side of the cleanup to be skipped, so it is easier to just
> make sure the variables are initialized than to special case the cleanup
> to handle the case when they're not.
> 
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++++++++++---------
>   1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index b3a429a92c0da..2160da2c83cbf 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1818,24 +1818,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   	 */
>   	GEM_BUG_ON(!guc->lrc_desc_pool);
>   
> -	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> -
> -	spin_lock_init(&guc->submission_state.lock);
> -	INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
> -	ida_init(&guc->submission_state.guc_ids);
> -	INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
> -	INIT_WORK(&guc->submission_state.destroyed_worker,
> -		  destroyed_worker_func);
> -	INIT_WORK(&guc->submission_state.reset_fail_worker,
> -		  reset_fail_worker_func);
> -
>   	guc->submission_state.guc_ids_bitmap =
>   		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
>   	if (!guc->submission_state.guc_ids_bitmap)
>   		return -ENOMEM;
>   
> -	spin_lock_init(&guc->timestamp.lock);
> -	INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
>   	guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) * HZ;
>   	guc->timestamp.shift = gpm_timestamp_shift(gt);
>   
> @@ -3831,6 +3818,20 @@ static bool __guc_submission_selected(struct intel_guc *guc)
>   
>   void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
> +	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> +
> +	spin_lock_init(&guc->submission_state.lock);
> +	INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
> +	ida_init(&guc->submission_state.guc_ids);
> +	INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
> +	INIT_WORK(&guc->submission_state.destroyed_worker,
> +		  destroyed_worker_func);
> +	INIT_WORK(&guc->submission_state.reset_fail_worker,
> +		  reset_fail_worker_func);
> +
> +	spin_lock_init(&guc->timestamp.lock);
> +	INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
> +
>   	guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
>   	guc->submission_supported = __guc_submission_supported(guc);
>   	guc->submission_selected = __guc_submission_selected(guc);
Daniele Ceraolo Spurio Feb. 15, 2022, 4:39 p.m. UTC | #2
On 2/15/2022 1:09 AM, Tvrtko Ursulin wrote:
>
> On 15/02/2022 01:11, Daniele Ceraolo Spurio wrote:
>> Move initialization of submission-related spinlock, lists and workers to
>> init_early. This fixes an issue where if the GuC init fails we might
>> still try to get the lock in the context cleanup code. Note that it is
>
> What's the worst case impact on non-debug builds aka is Fixes: required?

There is no lock contention in this scenario and nothing is done within 
the locked section (because submission is not initialized and all 
contexts are marked as invalid), so no problems from the fact that the 
lock doesn't work. Is that enough to avoid a fixes tag?

Daniele

>
> Regards,
>
> Tvrtko
>
>> safe to call the GuC context cleanup code even if the init failed
>> because all contexts are initialized with an invalid GuC ID, which will
>> cause the GuC side of the cleanup to be skipped, so it is easier to just
>> make sure the variables are initialized than to special case the cleanup
>> to handle the case when they're not.
>>
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++++++++++---------
>>   1 file changed, 14 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index b3a429a92c0da..2160da2c83cbf 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -1818,24 +1818,11 @@ int intel_guc_submission_init(struct 
>> intel_guc *guc)
>>        */
>>       GEM_BUG_ON(!guc->lrc_desc_pool);
>>   -    xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>> -
>> -    spin_lock_init(&guc->submission_state.lock);
>> -    INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
>> -    ida_init(&guc->submission_state.guc_ids);
>> - INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
>> -    INIT_WORK(&guc->submission_state.destroyed_worker,
>> -          destroyed_worker_func);
>> -    INIT_WORK(&guc->submission_state.reset_fail_worker,
>> -          reset_fail_worker_func);
>> -
>>       guc->submission_state.guc_ids_bitmap =
>>           bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
>>       if (!guc->submission_state.guc_ids_bitmap)
>>           return -ENOMEM;
>>   -    spin_lock_init(&guc->timestamp.lock);
>> -    INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
>>       guc->timestamp.ping_delay = (POLL_TIME_CLKS / 
>> gt->clock_frequency + 1) * HZ;
>>       guc->timestamp.shift = gpm_timestamp_shift(gt);
>>   @@ -3831,6 +3818,20 @@ static bool __guc_submission_selected(struct 
>> intel_guc *guc)
>>     void intel_guc_submission_init_early(struct intel_guc *guc)
>>   {
>> +    xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>> +
>> +    spin_lock_init(&guc->submission_state.lock);
>> +    INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
>> +    ida_init(&guc->submission_state.guc_ids);
>> + INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
>> +    INIT_WORK(&guc->submission_state.destroyed_worker,
>> +          destroyed_worker_func);
>> +    INIT_WORK(&guc->submission_state.reset_fail_worker,
>> +          reset_fail_worker_func);
>> +
>> +    spin_lock_init(&guc->timestamp.lock);
>> +    INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
>> +
>>       guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
>>       guc->submission_supported = __guc_submission_supported(guc);
>>       guc->submission_selected = __guc_submission_selected(guc);
Tvrtko Ursulin Feb. 16, 2022, 11:54 a.m. UTC | #3
On 15/02/2022 16:39, Ceraolo Spurio, Daniele wrote:
> On 2/15/2022 1:09 AM, Tvrtko Ursulin wrote:
>>
>> On 15/02/2022 01:11, Daniele Ceraolo Spurio wrote:
>>> Move initialization of submission-related spinlock, lists and workers to
>>> init_early. This fixes an issue where if the GuC init fails we might
>>> still try to get the lock in the context cleanup code. Note that it is
>>
>> What's the worst case impact on non-debug builds aka is Fixes: required?
> 
> There is no lock contention in this scenario and nothing is done within 
> the locked section (because submission is not initialized and all 
> contexts are marked as invalid), so no problems from the fact that the 
> lock doesn't work. Is that enough to avoid a fixes tag?

If the "lock doesn't work" is benign due no chance of touching 
uninitialised memory and then deadlocking then it's good. I just 
couldn't read that part from the commit message and did not have time to 
go dig into the code.

Regards,

Tvrtko

> Daniele
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>> safe to call the GuC context cleanup code even if the init failed
>>> because all contexts are initialized with an invalid GuC ID, which will
>>> cause the GuC side of the cleanup to be skipped, so it is easier to just
>>> make sure the variables are initialized than to special case the cleanup
>>> to handle the case when they're not.
>>>
>>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>> ---
>>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++++++++++---------
>>>   1 file changed, 14 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index b3a429a92c0da..2160da2c83cbf 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -1818,24 +1818,11 @@ int intel_guc_submission_init(struct 
>>> intel_guc *guc)
>>>        */
>>>       GEM_BUG_ON(!guc->lrc_desc_pool);
>>>   -    xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>>> -
>>> -    spin_lock_init(&guc->submission_state.lock);
>>> -    INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
>>> -    ida_init(&guc->submission_state.guc_ids);
>>> - INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
>>> -    INIT_WORK(&guc->submission_state.destroyed_worker,
>>> -          destroyed_worker_func);
>>> -    INIT_WORK(&guc->submission_state.reset_fail_worker,
>>> -          reset_fail_worker_func);
>>> -
>>>       guc->submission_state.guc_ids_bitmap =
>>>           bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
>>>       if (!guc->submission_state.guc_ids_bitmap)
>>>           return -ENOMEM;
>>>   -    spin_lock_init(&guc->timestamp.lock);
>>> -    INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
>>>       guc->timestamp.ping_delay = (POLL_TIME_CLKS / 
>>> gt->clock_frequency + 1) * HZ;
>>>       guc->timestamp.shift = gpm_timestamp_shift(gt);
>>>   @@ -3831,6 +3818,20 @@ static bool __guc_submission_selected(struct 
>>> intel_guc *guc)
>>>     void intel_guc_submission_init_early(struct intel_guc *guc)
>>>   {
>>> +    xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>>> +
>>> +    spin_lock_init(&guc->submission_state.lock);
>>> +    INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
>>> +    ida_init(&guc->submission_state.guc_ids);
>>> + INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
>>> +    INIT_WORK(&guc->submission_state.destroyed_worker,
>>> +          destroyed_worker_func);
>>> +    INIT_WORK(&guc->submission_state.reset_fail_worker,
>>> +          reset_fail_worker_func);
>>> +
>>> +    spin_lock_init(&guc->timestamp.lock);
>>> +    INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
>>> +
>>>       guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
>>>       guc->submission_supported = __guc_submission_supported(guc);
>>>       guc->submission_selected = __guc_submission_selected(guc);
>
John Harrison Feb. 19, 2022, 12:54 a.m. UTC | #4
On 2/14/2022 17:11, Daniele Ceraolo Spurio wrote:
> Move initialization of submission-related spinlock, lists and workers to
> init_early. This fixes an issue where if the GuC init fails we might
> still try to get the lock in the context cleanup code. Note that it is
> safe to call the GuC context cleanup code even if the init failed
> because all contexts are initialized with an invalid GuC ID, which will
> cause the GuC side of the cleanup to be skipped, so it is easier to just
> make sure the variables are initialized than to special case the cleanup
> to handle the case when they're not.
>
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++++++++++---------
>   1 file changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index b3a429a92c0da..2160da2c83cbf 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1818,24 +1818,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   	 */
>   	GEM_BUG_ON(!guc->lrc_desc_pool);
>   
> -	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> -
> -	spin_lock_init(&guc->submission_state.lock);
> -	INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
> -	ida_init(&guc->submission_state.guc_ids);
> -	INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
> -	INIT_WORK(&guc->submission_state.destroyed_worker,
> -		  destroyed_worker_func);
> -	INIT_WORK(&guc->submission_state.reset_fail_worker,
> -		  reset_fail_worker_func);
> -
>   	guc->submission_state.guc_ids_bitmap =
>   		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
>   	if (!guc->submission_state.guc_ids_bitmap)
>   		return -ENOMEM;
>   
> -	spin_lock_init(&guc->timestamp.lock);
> -	INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
>   	guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) * HZ;
>   	guc->timestamp.shift = gpm_timestamp_shift(gt);
>   
> @@ -3831,6 +3818,20 @@ static bool __guc_submission_selected(struct intel_guc *guc)
>   
>   void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
> +	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> +
> +	spin_lock_init(&guc->submission_state.lock);
> +	INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
> +	ida_init(&guc->submission_state.guc_ids);
> +	INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
> +	INIT_WORK(&guc->submission_state.destroyed_worker,
> +		  destroyed_worker_func);
> +	INIT_WORK(&guc->submission_state.reset_fail_worker,
> +		  reset_fail_worker_func);
> +
> +	spin_lock_init(&guc->timestamp.lock);
> +	INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
> +
>   	guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
>   	guc->submission_supported = __guc_submission_supported(guc);
>   	guc->submission_selected = __guc_submission_selected(guc);
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b3a429a92c0da..2160da2c83cbf 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1818,24 +1818,11 @@  int intel_guc_submission_init(struct intel_guc *guc)
 	 */
 	GEM_BUG_ON(!guc->lrc_desc_pool);
 
-	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
-
-	spin_lock_init(&guc->submission_state.lock);
-	INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
-	ida_init(&guc->submission_state.guc_ids);
-	INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
-	INIT_WORK(&guc->submission_state.destroyed_worker,
-		  destroyed_worker_func);
-	INIT_WORK(&guc->submission_state.reset_fail_worker,
-		  reset_fail_worker_func);
-
 	guc->submission_state.guc_ids_bitmap =
 		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
 	if (!guc->submission_state.guc_ids_bitmap)
 		return -ENOMEM;
 
-	spin_lock_init(&guc->timestamp.lock);
-	INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
 	guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) * HZ;
 	guc->timestamp.shift = gpm_timestamp_shift(gt);
 
@@ -3831,6 +3818,20 @@  static bool __guc_submission_selected(struct intel_guc *guc)
 
 void intel_guc_submission_init_early(struct intel_guc *guc)
 {
+	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
+
+	spin_lock_init(&guc->submission_state.lock);
+	INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
+	ida_init(&guc->submission_state.guc_ids);
+	INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
+	INIT_WORK(&guc->submission_state.destroyed_worker,
+		  destroyed_worker_func);
+	INIT_WORK(&guc->submission_state.reset_fail_worker,
+		  reset_fail_worker_func);
+
+	spin_lock_init(&guc->timestamp.lock);
+	INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
+
 	guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
 	guc->submission_supported = __guc_submission_supported(guc);
 	guc->submission_selected = __guc_submission_selected(guc);