diff mbox

[01/14] drm/amdgpu: fix wrong release of vmid owner

Message ID 1462386415-25600-1-git-send-email-alexander.deucher@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Deucher May 4, 2016, 6:26 p.m. UTC
From: Chunming Zhou <David1.Zhou@amd.com>

The release of the vmid owner was not handled
correctly.  We need to take the lock and walk
the lru list.

Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Comments

Daniel Vetter May 9, 2016, 8:17 a.m. UTC | #1
On Wed, May 04, 2016 at 02:26:42PM -0400, Alex Deucher wrote:
> From: Chunming Zhou <David1.Zhou@amd.com>
> 
> The release of the vmid owner was not handled
> correctly.  We need to take the lock and walk
> the lru list.
> 
> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> Reviewed-by: Monk Liu <monk.liu@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

I know that it's super hard to get former proprietary driver teams to
stick their heads out on a public mailing lists. But imo being steward for
them is totally the worst case option you can pick long term. It means you
keep all the frustration of them not being fully in control (because
sometimes other people from outside the company jump in), never learning
how to driver the process themselves. And from the community pov it just
looks like code-drop over the wall. In my experience (I've been trying to
pull this off in public for almost 4 years now) trying to make exceptions
to get folks started just doesn't help anyone.

Imo contributors need to fence for their patches themselves (with you
helping behind the scenes ofc) from the start.

Cheers, Daniel

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 856116a..e06d066 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1454,6 +1454,7 @@ error_free_sched_entity:
>  void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>  {
>  	struct amdgpu_bo_va_mapping *mapping, *tmp;
> +	struct amdgpu_vm_id *id, *id_tmp;
>  	int i;
>  
>  	amd_sched_entity_fini(vm->entity.sched, &vm->entity);
> @@ -1478,14 +1479,17 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>  	amdgpu_bo_unref(&vm->page_directory);
>  	fence_put(vm->page_directory_fence);
>  
> -	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> -		struct amdgpu_vm_id *id = vm->ids[i];
> -
> +	mutex_lock(&adev->vm_manager.lock);
> +	list_for_each_entry_safe(id, id_tmp, &adev->vm_manager.ids_lru,
> +				 list) {
>  		if (!id)
>  			continue;
> -
> -		atomic_long_cmpxchg(&id->owner, (long)vm, 0);
> +		if (atomic_long_read(&id->owner) == (long)vm) {
> +			atomic_long_set(&id->owner, 0);
> +			id->pd_gpu_addr = 0;
> +		}
>  	}
> +	mutex_unlock(&adev->vm_manager.lock);
>  }
>  
>  /**
> -- 
> 2.5.5
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Dave Airlie May 10, 2016, 5:05 a.m. UTC | #2
On 9 May 2016 at 18:17, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, May 04, 2016 at 02:26:42PM -0400, Alex Deucher wrote:
>> From: Chunming Zhou <David1.Zhou@amd.com>
>>
>> The release of the vmid owner was not handled
>> correctly.  We need to take the lock and walk
>> the lru list.
>>
>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>> Reviewed-by: Monk Liu <monk.liu@amd.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
> I know that it's super hard to get former proprietary driver teams to
> stick their heads out on a public mailing lists. But imo being steward for
> them is totally the worst case option you can pick long term. It means you
> keep all the frustration of them not being fully in control (because
> sometimes other people from outside the company jump in), never learning
> how to driver the process themselves. And from the community pov it just
> looks like code-drop over the wall. In my experience (I've been trying to
> pull this off in public for almost 4 years now) trying to make exceptions
> to get folks started just doesn't help anyone.
>
> Imo contributors need to fence for their patches themselves (with you
> helping behind the scenes ofc) from the start.

I'd prefer this as well, I'd also prefer at least people who do develop upstream
like Christian be set free to do so again, having patches spring on
the lists fully
formed isn't really great.

It would be also nice if there was more external discussion around
design decisions etc,
get the internal patch debate onto the mailing list.

Because at this rate I've no idea about most of the design internals
of the VM stuff,
and I really think you guys can do a lot better.

Maybe start setup another mailing list like intel-gfx for this, so
it's a bit more private
than dri-devel, but what is happening at the moment is worse than what
used to happen.

Dave.
Christian König May 10, 2016, 8:21 a.m. UTC | #3
Am 10.05.2016 um 07:05 schrieb Dave Airlie:
> On 9 May 2016 at 18:17, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Wed, May 04, 2016 at 02:26:42PM -0400, Alex Deucher wrote:
>>> From: Chunming Zhou <David1.Zhou@amd.com>
>>>
>>> The release of the vmid owner was not handled
>>> correctly.  We need to take the lock and walk
>>> the lru list.
>>>
>>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>>> Reviewed-by: Monk Liu <monk.liu@amd.com>
>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> I know that it's super hard to get former proprietary driver teams to
>> stick their heads out on a public mailing lists. But imo being steward for
>> them is totally the worst case option you can pick long term. It means you
>> keep all the frustration of them not being fully in control (because
>> sometimes other people from outside the company jump in), never learning
>> how to driver the process themselves. And from the community pov it just
>> looks like code-drop over the wall. In my experience (I've been trying to
>> pull this off in public for almost 4 years now) trying to make exceptions
>> to get folks started just doesn't help anyone.
>>
>> Imo contributors need to fence for their patches themselves (with you
>> helping behind the scenes ofc) from the start.
> I'd prefer this as well, I'd also prefer at least people who do develop upstream
> like Christian be set free to do so again, having patches spring on
> the lists fully
> formed isn't really great.

Yeah, completely agree.

I actually tried to work on the public list again. Especially since I'm 
working on TTM improvements that would make things much easier.

The problem is that then internal people started to complain that some 
patches only got reviewed and merged upstream, but not internally.

>
> It would be also nice if there was more external discussion around
> design decisions etc,
> get the internal patch debate onto the mailing list.

Yeah, again completely agree. Alex and I already spend a lot of effort 
trying to explain the difference between releasing code to the public 
and making code open source.

>
> Because at this rate I've no idea about most of the design internals
> of the VM stuff,
> and I really think you guys can do a lot better.
>
> Maybe start setup another mailing list like intel-gfx for this, so
> it's a bit more private
> than dri-devel, but what is happening at the moment is worse than what
> used to happen.

Yeah, that sounds like a good idea to me.

Regards,
Christian.

>
> Dave.
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter May 11, 2016, 7:46 a.m. UTC | #4
On Tue, May 10, 2016 at 10:21:53AM +0200, Christian König wrote:
> Am 10.05.2016 um 07:05 schrieb Dave Airlie:
> >On 9 May 2016 at 18:17, Daniel Vetter <daniel@ffwll.ch> wrote:
> >>On Wed, May 04, 2016 at 02:26:42PM -0400, Alex Deucher wrote:
> >>>From: Chunming Zhou <David1.Zhou@amd.com>
> >>>
> >>>The release of the vmid owner was not handled
> >>>correctly.  We need to take the lock and walk
> >>>the lru list.
> >>>
> >>>Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
> >>>Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> >>>Reviewed-by: Monk Liu <monk.liu@amd.com>
> >>>Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >>I know that it's super hard to get former proprietary driver teams to
> >>stick their heads out on a public mailing lists. But imo being steward for
> >>them is totally the worst case option you can pick long term. It means you
> >>keep all the frustration of them not being fully in control (because
> >>sometimes other people from outside the company jump in), never learning
> >>how to driver the process themselves. And from the community pov it just
> >>looks like code-drop over the wall. In my experience (I've been trying to
> >>pull this off in public for almost 4 years now) trying to make exceptions
> >>to get folks started just doesn't help anyone.
> >>
> >>Imo contributors need to fence for their patches themselves (with you
> >>helping behind the scenes ofc) from the start.
> >I'd prefer this as well, I'd also prefer at least people who do develop upstream
> >like Christian be set free to do so again, having patches spring on
> >the lists fully
> >formed isn't really great.
> 
> Yeah, completely agree.
> 
> I actually tried to work on the public list again. Especially since I'm
> working on TTM improvements that would make things much easier.
> 
> The problem is that then internal people started to complain that some
> patches only got reviewed and merged upstream, but not internally.

Here at intel there's no internal review for patches, except things which
are embargoed. And even there the internal review is very often just a
"you can split out this refactoring step, please submit that part to
upstream directly".

> >It would be also nice if there was more external discussion around
> >design decisions etc,
> >get the internal patch debate onto the mailing list.
> 
> Yeah, again completely agree. Alex and I already spend a lot of effort
> trying to explain the difference between releasing code to the public and
> making code open source.
> 
> >Because at this rate I've no idea about most of the design internals
> >of the VM stuff,
> >and I really think you guys can do a lot better.
> >
> >Maybe start setup another mailing list like intel-gfx for this, so
> >it's a bit more private
> >than dri-devel, but what is happening at the moment is worse than what
> >used to happen.
> 
> Yeah, that sounds like a good idea to me.

One thing I do to force the isse, and it's a bit an asshole move, is to
just not merge or review anything posted to internal lists that should go
upstream. At least with repeat offenders, new people get 2-3 reminders how
it's supposed to work. But once people realize that they're just wasting
their own time, they tend to learn much faster ;-)

The other thing I learned is that you'll not win any popularity contest as
upstream maintainer trying to get a proprietary team to open up :(

Good luck!

Cheers, Daniel
Dave Airlie May 11, 2016, 7:48 a.m. UTC | #5
On 11 May 2016 at 17:46, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, May 10, 2016 at 10:21:53AM +0200, Christian König wrote:
>> Am 10.05.2016 um 07:05 schrieb Dave Airlie:
>> >On 9 May 2016 at 18:17, Daniel Vetter <daniel@ffwll.ch> wrote:
>> >>On Wed, May 04, 2016 at 02:26:42PM -0400, Alex Deucher wrote:
>> >>>From: Chunming Zhou <David1.Zhou@amd.com>
>> >>>
>> >>>The release of the vmid owner was not handled
>> >>>correctly.  We need to take the lock and walk
>> >>>the lru list.
>> >>>
>> >>>Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>> >>>Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>> >>>Reviewed-by: Monk Liu <monk.liu@amd.com>
>> >>>Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> >>I know that it's super hard to get former proprietary driver teams to
>> >>stick their heads out on a public mailing lists. But imo being steward for
>> >>them is totally the worst case option you can pick long term. It means you
>> >>keep all the frustration of them not being fully in control (because
>> >>sometimes other people from outside the company jump in), never learning
>> >>how to driver the process themselves. And from the community pov it just
>> >>looks like code-drop over the wall. In my experience (I've been trying to
>> >>pull this off in public for almost 4 years now) trying to make exceptions
>> >>to get folks started just doesn't help anyone.
>> >>
>> >>Imo contributors need to fence for their patches themselves (with you
>> >>helping behind the scenes ofc) from the start.
>> >I'd prefer this as well, I'd also prefer at least people who do develop upstream
>> >like Christian be set free to do so again, having patches spring on
>> >the lists fully
>> >formed isn't really great.
>>
>> Yeah, completely agree.
>>
>> I actually tried to work on the public list again. Especially since I'm
>> working on TTM improvements that would make things much easier.
>>
>> The problem is that then internal people started to complain that some
>> patches only got reviewed and merged upstream, but not internally.
>
> Here at intel there's no internal review for patches, except things which
> are embargoed. And even there the internal review is very often just a
> "you can split out this refactoring step, please submit that part to
> upstream directly".
>
>> >It would be also nice if there was more external discussion around
>> >design decisions etc,
>> >get the internal patch debate onto the mailing list.
>>
>> Yeah, again completely agree. Alex and I already spend a lot of effort
>> trying to explain the difference between releasing code to the public and
>> making code open source.
>>
>> >Because at this rate I've no idea about most of the design internals
>> >of the VM stuff,
>> >and I really think you guys can do a lot better.
>> >
>> >Maybe start setup another mailing list like intel-gfx for this, so
>> >it's a bit more private
>> >than dri-devel, but what is happening at the moment is worse than what
>> >used to happen.
>>
>> Yeah, that sounds like a good idea to me.
>
> One thing I do to force the isse, and it's a bit an asshole move, is to
> just not merge or review anything posted to internal lists that should go
> upstream. At least with repeat offenders, new people get 2-3 reminders how
> it's supposed to work. But once people realize that they're just wasting
> their own time, they tend to learn much faster ;-)
>
> The other thing I learned is that you'll not win any popularity contest as
> upstream maintainer trying to get a proprietary team to open up :(

My feeling is things haven't gotten any better, in fact they may have
gotten worse,
if Christian is having such problems.

At this point I'm guessing you are going to have to use a stick rather
than carrots
unfortunately.

Nobody seems like they want to work in the open without encouragment.

Dave.
diff mbox

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 856116a..e06d066 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1454,6 +1454,7 @@  error_free_sched_entity:
 void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 {
 	struct amdgpu_bo_va_mapping *mapping, *tmp;
+	struct amdgpu_vm_id *id, *id_tmp;
 	int i;
 
 	amd_sched_entity_fini(vm->entity.sched, &vm->entity);
@@ -1478,14 +1479,17 @@  void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	amdgpu_bo_unref(&vm->page_directory);
 	fence_put(vm->page_directory_fence);
 
-	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
-		struct amdgpu_vm_id *id = vm->ids[i];
-
+	mutex_lock(&adev->vm_manager.lock);
+	list_for_each_entry_safe(id, id_tmp, &adev->vm_manager.ids_lru,
+				 list) {
 		if (!id)
 			continue;
-
-		atomic_long_cmpxchg(&id->owner, (long)vm, 0);
+		if (atomic_long_read(&id->owner) == (long)vm) {
+			atomic_long_set(&id->owner, 0);
+			id->pd_gpu_addr = 0;
+		}
 	}
+	mutex_unlock(&adev->vm_manager.lock);
 }
 
 /**