diff mbox series

drm/i915/guc: Remove racey GEM_BUG_ON

Message ID 20211209185141.21292-1-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/guc: Remove racey GEM_BUG_ON | expand

Commit Message

Matthew Brost Dec. 9, 2021, 6:51 p.m. UTC
A full GT can race with the last context put resulting in the context
ref count being zero but the destroyed bit not yet being set. Remove
GEM_BUG_ON in scrub_guc_desc_for_outstanding_g2h that asserts the
destroyed bit must be set in ref count is zero.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Daniele Ceraolo Spurio Dec. 9, 2021, 7:26 p.m. UTC | #1
On 12/9/2021 10:51 AM, Matthew Brost wrote:
> A full GT can race with the last context put resulting in the context
> ref count being zero but the destroyed bit not yet being set. Remove
> GEM_BUG_ON in scrub_guc_desc_for_outstanding_g2h that asserts the
> destroyed bit must be set in ref count is zero.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 --
>   1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 9b7b4f4e0d91..0f99bb83293a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1040,8 +1040,6 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   
>   		spin_unlock(&ce->guc_state.lock);
>   
> -		GEM_BUG_ON(!do_put && !destroyed);
> -

Do we need to re-queue/flush the destroyer work to make sure it runs 
before we reset, or is it ok for that to run in parallel?

Daniele

>   		if (pending_enable || destroyed || deregister) {
>   			decr_outstanding_submission_g2h(guc);
>   			if (deregister)
Matthew Brost Dec. 9, 2021, 7:57 p.m. UTC | #2
On Thu, Dec 09, 2021 at 11:26:09AM -0800, Daniele Ceraolo Spurio wrote:
> 
> 
> On 12/9/2021 10:51 AM, Matthew Brost wrote:
> > A full GT can race with the last context put resulting in the context
> > ref count being zero but the destroyed bit not yet being set. Remove
> > GEM_BUG_ON in scrub_guc_desc_for_outstanding_g2h that asserts the
> > destroyed bit must be set in ref count is zero.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 --
> >   1 file changed, 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 9b7b4f4e0d91..0f99bb83293a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1040,8 +1040,6 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> >   		spin_unlock(&ce->guc_state.lock);
> > -		GEM_BUG_ON(!do_put && !destroyed);
> > -
> 
> Do we need to re-queue/flush the destroyer work to make sure it runs before
> we reset, or is it ok for that to run in parallel?
> 

The code in the put path will either see the reset or that it isn't
registered and destroy the context without any interaction with the GuC.

Matt

> Daniele
> 
> >   		if (pending_enable || destroyed || deregister) {
> >   			decr_outstanding_submission_g2h(guc);
> >   			if (deregister)
>
Daniele Ceraolo Spurio Dec. 9, 2021, 8:04 p.m. UTC | #3
On 12/9/2021 11:57 AM, Matthew Brost wrote:
> On Thu, Dec 09, 2021 at 11:26:09AM -0800, Daniele Ceraolo Spurio wrote:
>>
>> On 12/9/2021 10:51 AM, Matthew Brost wrote:
>>> A full GT can race with the last context put resulting in the context

forgot to mention earlier but you're missing "reset" here

>>> ref count being zero but the destroyed bit not yet being set. Remove
>>> GEM_BUG_ON in scrub_guc_desc_for_outstanding_g2h that asserts the
>>> destroyed bit must be set in ref count is zero.
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 --
>>>    1 file changed, 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index 9b7b4f4e0d91..0f99bb83293a 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -1040,8 +1040,6 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>>>    		spin_unlock(&ce->guc_state.lock);
>>> -		GEM_BUG_ON(!do_put && !destroyed);
>>> -
>> Do we need to re-queue/flush the destroyer work to make sure it runs before
>> we reset, or is it ok for that to run in parallel?
>>
> The code in the put path will either see the reset or that it isn't
> registered and destroy the context without any interaction with the GuC.

ok.

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Daniele

>
> Matt
>
>> Daniele
>>
>>>    		if (pending_enable || destroyed || deregister) {
>>>    			decr_outstanding_submission_g2h(guc);
>>>    			if (deregister)
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9b7b4f4e0d91..0f99bb83293a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1040,8 +1040,6 @@  static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 
 		spin_unlock(&ce->guc_state.lock);
 
-		GEM_BUG_ON(!do_put && !destroyed);
-
 		if (pending_enable || destroyed || deregister) {
 			decr_outstanding_submission_g2h(guc);
 			if (deregister)