diff mbox series

[v5,5/5] drm/i915/gt: Make sure that errors are propagated through request chains

Message ID 20230412113308.812468-6-andi.shyti@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Fix error propagation amongst request | expand

Commit Message

Andi Shyti April 12, 2023, 11:33 a.m. UTC
Currently, when we perform operations such as clearing or copying
large blocks of memory, we generate multiple requests that are
executed in a chain.

However, if one of these requests fails, we may not realize it
unless it happens to be the last request in the chain. This is
because errors are not properly propagated.

For this we need to keep propagating the chain of fence
notification in order to always reach the final fence associated
to the final request.

To address this issue, we need to ensure that the chain of fence
notifications is always propagated so that we can reach the final
fence associated with the last request. By doing so, we will be
able to detect any memory operation  failures and determine
whether the memory is still invalid.

On copy and clear migration signal fences upon completion.

On copy and clear migration, signal fences upon request
completion to ensure that we have a reliable perpetuation of the
operation outcome.

Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration")
Reported-by: Matthew Auld <matthew.auld@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 51 +++++++++++++++++++------
 1 file changed, 39 insertions(+), 12 deletions(-)

Comments

Tvrtko Ursulin April 13, 2023, 11:56 a.m. UTC | #1
On 12/04/2023 12:33, Andi Shyti wrote:
> Currently, when we perform operations such as clearing or copying
> large blocks of memory, we generate multiple requests that are
> executed in a chain.
> 
> However, if one of these requests fails, we may not realize it
> unless it happens to be the last request in the chain. This is
> because errors are not properly propagated.
> 
> For this we need to keep propagating the chain of fence
> notification in order to always reach the final fence associated
> to the final request.
> 
> To address this issue, we need to ensure that the chain of fence
> notifications is always propagated so that we can reach the final
> fence associated with the last request. By doing so, we will be
> able to detect any memory operation  failures and determine
> whether the memory is still invalid.

Above two paragraphs seems to have redundancy in the message they convey.

> On copy and clear migration signal fences upon completion.
> 
> On copy and clear migration, signal fences upon request
> completion to ensure that we have a reliable perpetuation of the
> operation outcome.

These two too. So I think commit message can be a bit polished.

> Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration")
> Reported-by: Matthew Auld <matthew.auld@intel.com>
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_migrate.c | 51 +++++++++++++++++++------
>   1 file changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 3f638f1987968..668c95af8cbcf 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -742,13 +742,19 @@ intel_context_migrate_copy(struct intel_context *ce,
>   			dst_offset = 2 * CHUNK_SZ;
>   	}
>   
> +	/*
> +	 * While building the chain of requests, we need to ensure
> +	 * that no one can sneak into the timeline unnoticed.
> +	 */
> +	mutex_lock(&ce->timeline->mutex);
> +
>   	do {
>   		int len;
>   
> -		rq = i915_request_create(ce);
> +		rq = i915_request_create_locked(ce);
>   		if (IS_ERR(rq)) {
>   			err = PTR_ERR(rq);
> -			goto out_ce;
> +			break;
>   		}
>   
>   		if (deps) {
> @@ -878,10 +884,14 @@ intel_context_migrate_copy(struct intel_context *ce,
>   
>   		/* Arbitration is re-enabled between requests. */
>   out_rq:
> -		if (*out)
> +		i915_sw_fence_await(&rq->submit);
> +		i915_request_get(rq);
> +		i915_request_add_locked(rq);
> +		if (*out) {
> +			i915_sw_fence_complete(&(*out)->submit);
>   			i915_request_put(*out);

Could you help me understand this please. I have a few questions - 
first, what are the actual mechanics of fence error transfer here? I see 
the submit fence is being blocked until the next request is submitted - 
effectively previous request is only allowed to get on the hardware 
after the next one has been queued up. But I don't immediately see what 
that does in practice.

Second question relates to the need to hold the timeline mutex 
throughout. Presumably this is so two copy or migrate operations on the 
same context do not interleave, which can otherwise happen?

Would the error propagation be doable without the lock held by chaining 
on the previous request _completion_ fence? If so I am sure that would 
have a performance impact, because chunk by chunk would need a GPU<->CPU 
round trip to schedule. How much of an impact I don't know. Maybe 
enlarging CHUNK_SZ to compensate is an option?

Or if the perf hit would be bearable for stable backports only (much 
smaller patch) and then for tip we can do this full speed solution.

But yes, I would first want to understand the actual error propagation 
mechanism because sadly my working knowledge is a bit rusty.

> -		*out = i915_request_get(rq);
> -		i915_request_add(rq);
> +		}
> +		*out = rq;
>   
>   		if (err)
>   			break;
> @@ -905,7 +915,10 @@ intel_context_migrate_copy(struct intel_context *ce,
>   		cond_resched();
>   	} while (1);
>   
> -out_ce:
> +	mutex_unlock(&ce->timeline->mutex);
> +
> +	if (*out)
> +		i915_sw_fence_complete(&(*out)->submit);
>   	return err;
>   }
>   
> @@ -999,13 +1012,19 @@ intel_context_migrate_clear(struct intel_context *ce,
>   	if (HAS_64K_PAGES(i915) && is_lmem)
>   		offset = CHUNK_SZ;
>   
> +	/*
> +	 * While building the chain of requests, we need to ensure
> +	 * that no one can sneak into the timeline unnoticed.
> +	 */
> +	mutex_lock(&ce->timeline->mutex);
> +
>   	do {
>   		int len;
>   
> -		rq = i915_request_create(ce);
> +		rq = i915_request_create_locked(ce);
>   		if (IS_ERR(rq)) {
>   			err = PTR_ERR(rq);
> -			goto out_ce;
> +			break;
>   		}
>   
>   		if (deps) {
> @@ -1056,17 +1075,25 @@ intel_context_migrate_clear(struct intel_context *ce,
>   
>   		/* Arbitration is re-enabled between requests. */
>   out_rq:
> -		if (*out)
> +		i915_sw_fence_await(&rq->submit);
> +		i915_request_get(rq);
> +		i915_request_add_locked(rq);
> +		if (*out) {
> +			i915_sw_fence_complete(&(*out)->submit);
>   			i915_request_put(*out);
> -		*out = i915_request_get(rq);
> -		i915_request_add(rq);
> +		}
> +		*out = rq;

Btw if all else fails perhaps these two blocks can be consolidated by 
something like __chain_requests(rq, out) and all these operations in it. 
Not sure how much would that save in the grand total.

Regards,

Tvrtko

> +
>   		if (err || !it.sg || !sg_dma_len(it.sg))
>   			break;
>   
>   		cond_resched();
>   	} while (1);
>   
> -out_ce:
> +	mutex_unlock(&ce->timeline->mutex);
> +
> +	if (*out)
> +		i915_sw_fence_complete(&(*out)->submit);
>   	return err;
>   }
>
Tvrtko Ursulin April 13, 2023, 12:43 p.m. UTC | #2
On 13/04/2023 12:56, Tvrtko Ursulin wrote:
> 
> On 12/04/2023 12:33, Andi Shyti wrote:
>> Currently, when we perform operations such as clearing or copying
>> large blocks of memory, we generate multiple requests that are
>> executed in a chain.
>>
>> However, if one of these requests fails, we may not realize it
>> unless it happens to be the last request in the chain. This is
>> because errors are not properly propagated.
>>
>> For this we need to keep propagating the chain of fence
>> notification in order to always reach the final fence associated
>> to the final request.
>>
>> To address this issue, we need to ensure that the chain of fence
>> notifications is always propagated so that we can reach the final
>> fence associated with the last request. By doing so, we will be
>> able to detect any memory operation  failures and determine
>> whether the memory is still invalid.
> 
> Above two paragraphs seems to have redundancy in the message they convey.
> 
>> On copy and clear migration signal fences upon completion.
>>
>> On copy and clear migration, signal fences upon request
>> completion to ensure that we have a reliable perpetuation of the
>> operation outcome.
> 
> These two too. So I think commit message can be a bit polished.
> 
>> Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration")
>> Reported-by: Matthew Auld <matthew.auld@intel.com>
>> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>> Cc: stable@vger.kernel.org
>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_migrate.c | 51 +++++++++++++++++++------
>>   1 file changed, 39 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
>> b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> index 3f638f1987968..668c95af8cbcf 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> @@ -742,13 +742,19 @@ intel_context_migrate_copy(struct intel_context 
>> *ce,
>>               dst_offset = 2 * CHUNK_SZ;
>>       }
>> +    /*
>> +     * While building the chain of requests, we need to ensure
>> +     * that no one can sneak into the timeline unnoticed.
>> +     */
>> +    mutex_lock(&ce->timeline->mutex);
>> +
>>       do {
>>           int len;
>> -        rq = i915_request_create(ce);
>> +        rq = i915_request_create_locked(ce);
>>           if (IS_ERR(rq)) {
>>               err = PTR_ERR(rq);
>> -            goto out_ce;
>> +            break;
>>           }
>>           if (deps) {
>> @@ -878,10 +884,14 @@ intel_context_migrate_copy(struct intel_context 
>> *ce,
>>           /* Arbitration is re-enabled between requests. */
>>   out_rq:
>> -        if (*out)
>> +        i915_sw_fence_await(&rq->submit);
>> +        i915_request_get(rq);
>> +        i915_request_add_locked(rq);
>> +        if (*out) {
>> +            i915_sw_fence_complete(&(*out)->submit);
>>               i915_request_put(*out);
> 
> Could you help me understand this please. I have a few questions - 
> first, what are the actual mechanics of fence error transfer here? I see 
> the submit fence is being blocked until the next request is submitted - 
> effectively previous request is only allowed to get on the hardware 
> after the next one has been queued up. But I don't immediately see what 
> that does in practice.
> 
> Second question relates to the need to hold the timeline mutex 
> throughout. Presumably this is so two copy or migrate operations on the 
> same context do not interleave, which can otherwise happen?
> 
> Would the error propagation be doable without the lock held by chaining 
> on the previous request _completion_ fence? If so I am sure that would 
> have a performance impact, because chunk by chunk would need a GPU<->CPU 
> round trip to schedule. How much of an impact I don't know. Maybe 
> enlarging CHUNK_SZ to compensate is an option?
> 
> Or if the perf hit would be bearable for stable backports only (much 
> smaller patch) and then for tip we can do this full speed solution.
> 
> But yes, I would first want to understand the actual error propagation 
> mechanism because sadly my working knowledge is a bit rusty.

Another option - maybe - is this related to revert of fence error 
propagation? If it is and having that would avoid the need for this 
invasive fix, maybe we unrevert 3761baae908a7b5012be08d70fa553cc2eb82305 
with edits to limit to special contexts? If doable..

Regards,

Tvrtko

> 
>> -        *out = i915_request_get(rq);
>> -        i915_request_add(rq);
>> +        }
>> +        *out = rq;
>>           if (err)
>>               break;
>> @@ -905,7 +915,10 @@ intel_context_migrate_copy(struct intel_context *ce,
>>           cond_resched();
>>       } while (1);
>> -out_ce:
>> +    mutex_unlock(&ce->timeline->mutex);
>> +
>> +    if (*out)
>> +        i915_sw_fence_complete(&(*out)->submit);
>>       return err;
>>   }
>> @@ -999,13 +1012,19 @@ intel_context_migrate_clear(struct 
>> intel_context *ce,
>>       if (HAS_64K_PAGES(i915) && is_lmem)
>>           offset = CHUNK_SZ;
>> +    /*
>> +     * While building the chain of requests, we need to ensure
>> +     * that no one can sneak into the timeline unnoticed.
>> +     */
>> +    mutex_lock(&ce->timeline->mutex);
>> +
>>       do {
>>           int len;
>> -        rq = i915_request_create(ce);
>> +        rq = i915_request_create_locked(ce);
>>           if (IS_ERR(rq)) {
>>               err = PTR_ERR(rq);
>> -            goto out_ce;
>> +            break;
>>           }
>>           if (deps) {
>> @@ -1056,17 +1075,25 @@ intel_context_migrate_clear(struct 
>> intel_context *ce,
>>           /* Arbitration is re-enabled between requests. */
>>   out_rq:
>> -        if (*out)
>> +        i915_sw_fence_await(&rq->submit);
>> +        i915_request_get(rq);
>> +        i915_request_add_locked(rq);
>> +        if (*out) {
>> +            i915_sw_fence_complete(&(*out)->submit);
>>               i915_request_put(*out);
>> -        *out = i915_request_get(rq);
>> -        i915_request_add(rq);
>> +        }
>> +        *out = rq;
> 
> Btw if all else fails perhaps these two blocks can be consolidated by 
> something like __chain_requests(rq, out) and all these operations in it. 
> Not sure how much would that save in the grand total.
> 
> Regards,
> 
> Tvrtko
> 
>> +
>>           if (err || !it.sg || !sg_dma_len(it.sg))
>>               break;
>>           cond_resched();
>>       } while (1);
>> -out_ce:
>> +    mutex_unlock(&ce->timeline->mutex);
>> +
>> +    if (*out)
>> +        i915_sw_fence_complete(&(*out)->submit);
>>       return err;
>>   }
Andi Shyti May 4, 2023, 9:37 a.m. UTC | #3
Hi Tvrtko,

sorry for the very late reply, it's about time to bring this
patch up.

On Thu, Apr 13, 2023 at 12:56:00PM +0100, Tvrtko Ursulin wrote:
> 
> On 12/04/2023 12:33, Andi Shyti wrote:
> > Currently, when we perform operations such as clearing or copying
> > large blocks of memory, we generate multiple requests that are
> > executed in a chain.
> > 
> > However, if one of these requests fails, we may not realize it
> > unless it happens to be the last request in the chain. This is
> > because errors are not properly propagated.
> > 
> > For this we need to keep propagating the chain of fence
> > notification in order to always reach the final fence associated
> > to the final request.
> > 
> > To address this issue, we need to ensure that the chain of fence
> > notifications is always propagated so that we can reach the final
> > fence associated with the last request. By doing so, we will be
> > able to detect any memory operation  failures and determine
> > whether the memory is still invalid.
> 
> Above two paragraphs seems to have redundancy in the message they convey.
> 
> > On copy and clear migration signal fences upon completion.
> > 
> > On copy and clear migration, signal fences upon request
> > completion to ensure that we have a reliable perpetuation of the
> > operation outcome.
> 
> These two too. So I think commit message can be a bit polished.

In my intent of being very explicative I might have exaggerated.
I know that these kind of patches might bring some controversy.

I will review the commit.

> > Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration")
> > Reported-by: Matthew Auld <matthew.auld@intel.com>
> > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> > Cc: stable@vger.kernel.org
> > Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> > Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_migrate.c | 51 +++++++++++++++++++------
> >   1 file changed, 39 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > index 3f638f1987968..668c95af8cbcf 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > @@ -742,13 +742,19 @@ intel_context_migrate_copy(struct intel_context *ce,
> >   			dst_offset = 2 * CHUNK_SZ;
> >   	}
> > +	/*
> > +	 * While building the chain of requests, we need to ensure
> > +	 * that no one can sneak into the timeline unnoticed.
> > +	 */
> > +	mutex_lock(&ce->timeline->mutex);
> > +
> >   	do {
> >   		int len;
> > -		rq = i915_request_create(ce);
> > +		rq = i915_request_create_locked(ce);
> >   		if (IS_ERR(rq)) {
> >   			err = PTR_ERR(rq);
> > -			goto out_ce;
> > +			break;
> >   		}
> >   		if (deps) {
> > @@ -878,10 +884,14 @@ intel_context_migrate_copy(struct intel_context *ce,
> >   		/* Arbitration is re-enabled between requests. */
> >   out_rq:
> > -		if (*out)
> > +		i915_sw_fence_await(&rq->submit);
> > +		i915_request_get(rq);
> > +		i915_request_add_locked(rq);
> > +		if (*out) {
> > +			i915_sw_fence_complete(&(*out)->submit);
> >   			i915_request_put(*out);
> 
> Could you help me understand this please. I have a few questions - first,
> what are the actual mechanics of fence error transfer here? I see the submit
> fence is being blocked until the next request is submitted - effectively
> previous request is only allowed to get on the hardware after the next one
> has been queued up. But I don't immediately see what that does in practice.

This is the basic of the error perpetuation. Without this
serialization, for big operations like migrate and copy, we would
only catch the error in the last rq.

> Second question relates to the need to hold the timeline mutex throughout.
> Presumably this is so two copy or migrate operations on the same context do
> not interleave, which can otherwise happen?
> 
> Would the error propagation be doable without the lock held by chaining on
> the previous request _completion_ fence? If so I am sure that would have a
> performance impact, because chunk by chunk would need a GPU<->CPU round trip
> to schedule. How much of an impact I don't know. Maybe enlarging CHUNK_SZ to
> compensate is an option?

The need for a mutex lock comes from adding the throttle during
request creation, which ensures no pending requests are being
served.

I will copy paste from Chris review, which was missed in the
mailing list:

Adding a large throttle before the mutex makes the race less
likely, but to overcome that just increase the number of
simultaneous clients fighting for ring space.

If we hold the lock while constructing the chain, no one else may
inject themselves between links in our chain. If we do not, we
may end up with

ABCDEFGHI
^head   ^tail

Then in order for A to submit its next request it has to wait
upon its previous request. But since we are holding the submit
fence for A, it will not be executed until after we complete our
submission. Boom.

Andi

> Or if the perf hit would be bearable for stable backports only (much smaller
> patch) and then for tip we can do this full speed solution.
> 
> But yes, I would first want to understand the actual error propagation
> mechanism because sadly my working knowledge is a bit rusty.
> 
> > -		*out = i915_request_get(rq);
> > -		i915_request_add(rq);
> > +		}
> > +		*out = rq;
> >   		if (err)
> >   			break;
> > @@ -905,7 +915,10 @@ intel_context_migrate_copy(struct intel_context *ce,
> >   		cond_resched();
> >   	} while (1);
> > -out_ce:
> > +	mutex_unlock(&ce->timeline->mutex);
> > +
> > +	if (*out)
> > +		i915_sw_fence_complete(&(*out)->submit);
> >   	return err;
> >   }
> > @@ -999,13 +1012,19 @@ intel_context_migrate_clear(struct intel_context *ce,
> >   	if (HAS_64K_PAGES(i915) && is_lmem)
> >   		offset = CHUNK_SZ;
> > +	/*
> > +	 * While building the chain of requests, we need to ensure
> > +	 * that no one can sneak into the timeline unnoticed.
> > +	 */
> > +	mutex_lock(&ce->timeline->mutex);
> > +
> >   	do {
> >   		int len;
> > -		rq = i915_request_create(ce);
> > +		rq = i915_request_create_locked(ce);
> >   		if (IS_ERR(rq)) {
> >   			err = PTR_ERR(rq);
> > -			goto out_ce;
> > +			break;
> >   		}
> >   		if (deps) {
> > @@ -1056,17 +1075,25 @@ intel_context_migrate_clear(struct intel_context *ce,
> >   		/* Arbitration is re-enabled between requests. */
> >   out_rq:
> > -		if (*out)
> > +		i915_sw_fence_await(&rq->submit);
> > +		i915_request_get(rq);
> > +		i915_request_add_locked(rq);
> > +		if (*out) {
> > +			i915_sw_fence_complete(&(*out)->submit);
> >   			i915_request_put(*out);
> > -		*out = i915_request_get(rq);
> > -		i915_request_add(rq);
> > +		}
> > +		*out = rq;
> 
> Btw if all else fails perhaps these two blocks can be consolidated by
> something like __chain_requests(rq, out) and all these operations in it. Not
> sure how much would that save in the grand total.
> 
> Regards,
> 
> Tvrtko
> 
> > +
> >   		if (err || !it.sg || !sg_dma_len(it.sg))
> >   			break;
> >   		cond_resched();
> >   	} while (1);
> > -out_ce:
> > +	mutex_unlock(&ce->timeline->mutex);
> > +
> > +	if (*out)
> > +		i915_sw_fence_complete(&(*out)->submit);
> >   	return err;
> >   }
Andi Shyti May 4, 2023, 9:45 a.m. UTC | #4
Hi Tvrtko,

> Another option - maybe - is this related to revert of fence error
> propagation? If it is and having that would avoid the need for this invasive
> fix, maybe we unrevert 3761baae908a7b5012be08d70fa553cc2eb82305 with edits
> to limit to special contexts? If doable..

I think that is not enough as we want to get anyway to the last
request and fence submitted. Right?

I guess this commit should be reverted anyway.

Andi
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 3f638f1987968..668c95af8cbcf 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -742,13 +742,19 @@  intel_context_migrate_copy(struct intel_context *ce,
 			dst_offset = 2 * CHUNK_SZ;
 	}
 
+	/*
+	 * While building the chain of requests, we need to ensure
+	 * that no one can sneak into the timeline unnoticed.
+	 */
+	mutex_lock(&ce->timeline->mutex);
+
 	do {
 		int len;
 
-		rq = i915_request_create(ce);
+		rq = i915_request_create_locked(ce);
 		if (IS_ERR(rq)) {
 			err = PTR_ERR(rq);
-			goto out_ce;
+			break;
 		}
 
 		if (deps) {
@@ -878,10 +884,14 @@  intel_context_migrate_copy(struct intel_context *ce,
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-		if (*out)
+		i915_sw_fence_await(&rq->submit);
+		i915_request_get(rq);
+		i915_request_add_locked(rq);
+		if (*out) {
+			i915_sw_fence_complete(&(*out)->submit);
 			i915_request_put(*out);
-		*out = i915_request_get(rq);
-		i915_request_add(rq);
+		}
+		*out = rq;
 
 		if (err)
 			break;
@@ -905,7 +915,10 @@  intel_context_migrate_copy(struct intel_context *ce,
 		cond_resched();
 	} while (1);
 
-out_ce:
+	mutex_unlock(&ce->timeline->mutex);
+
+	if (*out)
+		i915_sw_fence_complete(&(*out)->submit);
 	return err;
 }
 
@@ -999,13 +1012,19 @@  intel_context_migrate_clear(struct intel_context *ce,
 	if (HAS_64K_PAGES(i915) && is_lmem)
 		offset = CHUNK_SZ;
 
+	/*
+	 * While building the chain of requests, we need to ensure
+	 * that no one can sneak into the timeline unnoticed.
+	 */
+	mutex_lock(&ce->timeline->mutex);
+
 	do {
 		int len;
 
-		rq = i915_request_create(ce);
+		rq = i915_request_create_locked(ce);
 		if (IS_ERR(rq)) {
 			err = PTR_ERR(rq);
-			goto out_ce;
+			break;
 		}
 
 		if (deps) {
@@ -1056,17 +1075,25 @@  intel_context_migrate_clear(struct intel_context *ce,
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-		if (*out)
+		i915_sw_fence_await(&rq->submit);
+		i915_request_get(rq);
+		i915_request_add_locked(rq);
+		if (*out) {
+			i915_sw_fence_complete(&(*out)->submit);
 			i915_request_put(*out);
-		*out = i915_request_get(rq);
-		i915_request_add(rq);
+		}
+		*out = rq;
+
 		if (err || !it.sg || !sg_dma_len(it.sg))
 			break;
 
 		cond_resched();
 	} while (1);
 
-out_ce:
+	mutex_unlock(&ce->timeline->mutex);
+
+	if (*out)
+		i915_sw_fence_complete(&(*out)->submit);
 	return err;
 }