diff mbox

[06/19] drm/vmwgfx: Drop the cursor locking hack

Message ID 20170322215058.8671-7-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter March 22, 2017, 9:50 p.m. UTC
It's been around forever, no one bothered to address the FIXME, so I
presume it's all fine.

Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 25 -------------------------
 1 file changed, 25 deletions(-)

Comments

Thomas Hellstrom March 23, 2017, 6:22 a.m. UTC | #1
On 03/22/2017 10:50 PM, Daniel Vetter wrote:
> It's been around forever, no one bothered to address the FIXME, so I
> presume it's all fine.
>
> Cc: Sinclair Yeh <syeh@vmware.com>
> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

NAK. We need to properly address this. Probably as part of the atomic
update.
/Thomas



> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 25 -------------------------
>  1 file changed, 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index d492d57d5309..424b3fc57203 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -148,15 +148,6 @@ int vmw_du_crtc_cursor_set2(struct drm_crtc *crtc, struct drm_file *file_priv,
>  	s32 hotspot_x, hotspot_y;
>  	int ret;
>  
> -	/*
> -	 * FIXME: Unclear whether there's any global state touched by the
> -	 * cursor_set function, especially vmw_cursor_update_position looks
> -	 * suspicious. For now take the easy route and reacquire all locks. We
> -	 * can do this since the caller in the drm core doesn't check anything
> -	 * which is protected by any looks.
> -	 */
> -	drm_modeset_unlock_crtc(crtc);
> -	drm_modeset_lock_all(dev_priv->dev);
>  	hotspot_x = hot_x + du->hotspot_x;
>  	hotspot_y = hot_y + du->hotspot_y;
>  
> @@ -224,9 +215,6 @@ int vmw_du_crtc_cursor_set2(struct drm_crtc *crtc, struct drm_file *file_priv,
>  	}
>  
>  out:
> -	drm_modeset_unlock_all(dev_priv->dev);
> -	drm_modeset_lock_crtc(crtc, crtc->cursor);
> -
>  	return ret;
>  }
>  
> @@ -239,25 +227,12 @@ int vmw_du_crtc_cursor_move(struct drm_crtc *crtc, int x, int y)
>  	du->cursor_x = x + du->set_gui_x;
>  	du->cursor_y = y + du->set_gui_y;
>  
> -	/*
> -	 * FIXME: Unclear whether there's any global state touched by the
> -	 * cursor_set function, especially vmw_cursor_update_position looks
> -	 * suspicious. For now take the easy route and reacquire all locks. We
> -	 * can do this since the caller in the drm core doesn't check anything
> -	 * which is protected by any looks.
> -	 */
> -	drm_modeset_unlock_crtc(crtc);
> -	drm_modeset_lock_all(dev_priv->dev);
> -
>  	vmw_cursor_update_position(dev_priv, shown,
>  				   du->cursor_x + du->hotspot_x +
>  				   du->core_hotspot_x,
>  				   du->cursor_y + du->hotspot_y +
>  				   du->core_hotspot_y);
>  
> -	drm_modeset_unlock_all(dev_priv->dev);
> -	drm_modeset_lock_crtc(crtc, crtc->cursor);
> -
>  	return 0;
>  }
>
Daniel Vetter March 23, 2017, 7:28 a.m. UTC | #2
On Thu, Mar 23, 2017 at 07:22:31AM +0100, Thomas Hellstrom wrote:
> On 03/22/2017 10:50 PM, Daniel Vetter wrote:
> > It's been around forever, no one bothered to address the FIXME, so I
> > presume it's all fine.
> >
> > Cc: Sinclair Yeh <syeh@vmware.com>
> > Cc: Thomas Hellstrom <thellstrom@vmware.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> NAK. We need to properly address this. Probably as part of the atomic
> update.

So could someone with vmwgfx understanding explain this? Note that the
FIXME was originally added by me years ago, because I wasn't sure (only
about 90%) that this is safe, and was essentially pleading for a vmwgfx
expert to review this?

Since it didn't happen I presume it's not that terribly and probably safe
...

I'm still 90% sure that this is correct, but I'd love for a vmwgfx to
audit it. Replying with a NAK is kinda not the response I was hoping for
(and yes I guess I should have explained what's going on here better, but
it's just a git blame of the FIXME comment away).

Thanks,

Daniel

> /Thomas
> 
> 
> 
> > ---
> >  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 25 -------------------------
> >  1 file changed, 25 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> > index d492d57d5309..424b3fc57203 100644
> > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> > @@ -148,15 +148,6 @@ int vmw_du_crtc_cursor_set2(struct drm_crtc *crtc, struct drm_file *file_priv,
> >  	s32 hotspot_x, hotspot_y;
> >  	int ret;
> >  
> > -	/*
> > -	 * FIXME: Unclear whether there's any global state touched by the
> > -	 * cursor_set function, especially vmw_cursor_update_position looks
> > -	 * suspicious. For now take the easy route and reacquire all locks. We
> > -	 * can do this since the caller in the drm core doesn't check anything
> > -	 * which is protected by any looks.
> > -	 */
> > -	drm_modeset_unlock_crtc(crtc);
> > -	drm_modeset_lock_all(dev_priv->dev);
> >  	hotspot_x = hot_x + du->hotspot_x;
> >  	hotspot_y = hot_y + du->hotspot_y;
> >  
> > @@ -224,9 +215,6 @@ int vmw_du_crtc_cursor_set2(struct drm_crtc *crtc, struct drm_file *file_priv,
> >  	}
> >  
> >  out:
> > -	drm_modeset_unlock_all(dev_priv->dev);
> > -	drm_modeset_lock_crtc(crtc, crtc->cursor);
> > -
> >  	return ret;
> >  }
> >  
> > @@ -239,25 +227,12 @@ int vmw_du_crtc_cursor_move(struct drm_crtc *crtc, int x, int y)
> >  	du->cursor_x = x + du->set_gui_x;
> >  	du->cursor_y = y + du->set_gui_y;
> >  
> > -	/*
> > -	 * FIXME: Unclear whether there's any global state touched by the
> > -	 * cursor_set function, especially vmw_cursor_update_position looks
> > -	 * suspicious. For now take the easy route and reacquire all locks. We
> > -	 * can do this since the caller in the drm core doesn't check anything
> > -	 * which is protected by any looks.
> > -	 */
> > -	drm_modeset_unlock_crtc(crtc);
> > -	drm_modeset_lock_all(dev_priv->dev);
> > -
> >  	vmw_cursor_update_position(dev_priv, shown,
> >  				   du->cursor_x + du->hotspot_x +
> >  				   du->core_hotspot_x,
> >  				   du->cursor_y + du->hotspot_y +
> >  				   du->core_hotspot_y);
> >  
> > -	drm_modeset_unlock_all(dev_priv->dev);
> > -	drm_modeset_lock_crtc(crtc, crtc->cursor);
> > -
> >  	return 0;
> >  }
> >  
> 
>
Daniel Vetter March 23, 2017, 7:31 a.m. UTC | #3
On Thu, Mar 23, 2017 at 08:28:32AM +0100, Daniel Vetter wrote:
> On Thu, Mar 23, 2017 at 07:22:31AM +0100, Thomas Hellstrom wrote:
> > On 03/22/2017 10:50 PM, Daniel Vetter wrote:
> > > It's been around forever, no one bothered to address the FIXME, so I
> > > presume it's all fine.
> > >
> > > Cc: Sinclair Yeh <syeh@vmware.com>
> > > Cc: Thomas Hellstrom <thellstrom@vmware.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > 
> > NAK. We need to properly address this. Probably as part of the atomic
> > update.
> 
> So could someone with vmwgfx understanding explain this? Note that the
> FIXME was originally added by me years ago, because I wasn't sure (only
> about 90%) that this is safe, and was essentially pleading for a vmwgfx
> expert to review this?
> 
> Since it didn't happen I presume it's not that terribly and probably safe
> ...
> 
> I'm still 90% sure that this is correct, but I'd love for a vmwgfx to
> audit it. Replying with a NAK is kinda not the response I was hoping for
> (and yes I guess I should have explained what's going on here better, but
> it's just a git blame of the FIXME comment away).

Bit more context even: This lock dropping dance is _not_ safe from a drm
core perspective. But when I've done the original kms locking rework the
tradeoff between upsetting core state a bit and totally breaking vmwgfx
leaned towards not breaking vmwgfx. And iirc you or Syeh promised to look
at this and then either remove the FIXME, maybe with a vmwgfx lock/unlock
added if there's a gap (I looked, didn't find one, but I don't understand
vmwgfx in details really).

Thanks, Daniel
Thomas Hellstrom March 23, 2017, 8:35 a.m. UTC | #4
Hi, Daniel,

On 03/23/2017 08:31 AM, Daniel Vetter wrote:
> On Thu, Mar 23, 2017 at 08:28:32AM +0100, Daniel Vetter wrote:
>> On Thu, Mar 23, 2017 at 07:22:31AM +0100, Thomas Hellstrom wrote:
>>> On 03/22/2017 10:50 PM, Daniel Vetter wrote:
>>>> It's been around forever, no one bothered to address the FIXME, so I
>>>> presume it's all fine.
>>>>
>>>> Cc: Sinclair Yeh <syeh@vmware.com>
>>>> Cc: Thomas Hellstrom <thellstrom@vmware.com>
>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> NAK. We need to properly address this. Probably as part of the atomic
>>> update.
>> So could someone with vmwgfx understanding explain this? Note that the
>> FIXME was originally added by me years ago, because I wasn't sure (only
>> about 90%) that this is safe, and was essentially pleading for a vmwgfx
>> expert to review this?
>>
>> Since it didn't happen I presume it's not that terribly and probably safe
>> ...
>>
>> I'm still 90% sure that this is correct, but I'd love for a vmwgfx to
>> audit it. Replying with a NAK is kinda not the response I was hoping for
>> (and yes I guess I should have explained what's going on here better, but
>> it's just a git blame of the FIXME comment away).

So the code has been left in place because it works. Altering it now
will create unnecessary merge conflicts with the atomic code, and the
change isn't tested and audited which means we need to drop focus from
what we're doing and audit and test code that isn't going to be used
anyway for not apparent reason? But otoh put in the below context there
indeed is a reason.

From a quick audit of the existing code it seems like at least
vmw_cursor_update_position is touching global device state so I think at
a minimum we need to take a spinlock in that function. Otherwise it
seems to be safe.

But I prefer if we can do that as part of the atomic update?

Thanks,
Thomas


> Bit more context even: This lock dropping dance is _not_ safe from a drm
> core perspective. But when I've done the original kms locking rework the
> tradeoff between upsetting core state a bit and totally breaking vmwgfx
> leaned towards not breaking vmwgfx. And iirc you or Syeh promised to look
> at this and then either remove the FIXME, maybe with a vmwgfx lock/unlock
> added if there's a gap (I looked, didn't find one, but I don't understand
> vmwgfx in details really).
>
> Thanks, Daniel
Daniel Vetter March 23, 2017, 10:10 a.m. UTC | #5
On Thu, Mar 23, 2017 at 09:35:25AM +0100, Thomas Hellstrom wrote:
> Hi, Daniel,
> 
> On 03/23/2017 08:31 AM, Daniel Vetter wrote:
> > On Thu, Mar 23, 2017 at 08:28:32AM +0100, Daniel Vetter wrote:
> >> On Thu, Mar 23, 2017 at 07:22:31AM +0100, Thomas Hellstrom wrote:
> >>> On 03/22/2017 10:50 PM, Daniel Vetter wrote:
> >>>> It's been around forever, no one bothered to address the FIXME, so I
> >>>> presume it's all fine.
> >>>>
> >>>> Cc: Sinclair Yeh <syeh@vmware.com>
> >>>> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> >>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>> NAK. We need to properly address this. Probably as part of the atomic
> >>> update.
> >> So could someone with vmwgfx understanding explain this? Note that the
> >> FIXME was originally added by me years ago, because I wasn't sure (only
> >> about 90%) that this is safe, and was essentially pleading for a vmwgfx
> >> expert to review this?
> >>
> >> Since it didn't happen I presume it's not that terribly and probably safe
> >> ...
> >>
> >> I'm still 90% sure that this is correct, but I'd love for a vmwgfx to
> >> audit it. Replying with a NAK is kinda not the response I was hoping for
> >> (and yes I guess I should have explained what's going on here better, but
> >> it's just a git blame of the FIXME comment away).
> 
> So the code has been left in place because it works. Altering it now
> will create unnecessary merge conflicts with the atomic code, and the
> change isn't tested and audited which means we need to drop focus from
> what we're doing and audit and test code that isn't going to be used
> anyway for not apparent reason? But otoh put in the below context there
> indeed is a reason.
> 
> From a quick audit of the existing code it seems like at least
> vmw_cursor_update_position is touching global device state so I think at
> a minimum we need to take a spinlock in that function. Otherwise it
> seems to be safe.

Note that you're holding the crtc lock already, which gives you exclusion
against concurrent page_flips, mode_sets and property changes. Note also
that page_flips themselves also only hold the crtc lock, so you can run
multiple page_flips in parallel on different crtc (iirc vmwgfx has
multiple crtc, if not this discussion is entirely moot).

tbh I'd be surprised if my patch really breaks something that hasn't been
a pre-existing issue for a long time. The original commit which added this
FIXME comment is from 2012. Note also that because it's a hack, you
already have a pretty a real race with the core drm state keeping, and no
one seems to have hit that either.

I mean I can dig through vmwgfx code and do the audit, but it'll take a
few hours and vmwgfx is it's own world, so much harder to understand (for
me).

> But I prefer if we can do that as part of the atomic update?

When does that vmwgfx atomic happen?
-Daniel
Thomas Hellstrom March 23, 2017, 10:32 a.m. UTC | #6
On 03/23/2017 11:10 AM, Daniel Vetter wrote:
> On Thu, Mar 23, 2017 at 09:35:25AM +0100, Thomas Hellstrom wrote:
>> Hi, Daniel,
>>
>> On 03/23/2017 08:31 AM, Daniel Vetter wrote:
>>> On Thu, Mar 23, 2017 at 08:28:32AM +0100, Daniel Vetter wrote:
>>>> On Thu, Mar 23, 2017 at 07:22:31AM +0100, Thomas Hellstrom wrote:
>>>>> On 03/22/2017 10:50 PM, Daniel Vetter wrote:
>>>>>> It's been around forever, no one bothered to address the FIXME, so I
>>>>>> presume it's all fine.
>>>>>>
>>>>>> Cc: Sinclair Yeh <syeh@vmware.com>
>>>>>> Cc: Thomas Hellstrom <thellstrom@vmware.com>
>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>> NAK. We need to properly address this. Probably as part of the atomic
>>>>> update.
>>>> So could someone with vmwgfx understanding explain this? Note that the
>>>> FIXME was originally added by me years ago, because I wasn't sure (only
>>>> about 90%) that this is safe, and was essentially pleading for a vmwgfx
>>>> expert to review this?
>>>>
>>>> Since it didn't happen I presume it's not that terribly and probably safe
>>>> ...
>>>>
>>>> I'm still 90% sure that this is correct, but I'd love for a vmwgfx to
>>>> audit it. Replying with a NAK is kinda not the response I was hoping for
>>>> (and yes I guess I should have explained what's going on here better, but
>>>> it's just a git blame of the FIXME comment away).
>> So the code has been left in place because it works. Altering it now
>> will create unnecessary merge conflicts with the atomic code, and the
>> change isn't tested and audited which means we need to drop focus from
>> what we're doing and audit and test code that isn't going to be used
>> anyway for not apparent reason? But otoh put in the below context there
>> indeed is a reason.
>>
>> From a quick audit of the existing code it seems like at least
>> vmw_cursor_update_position is touching global device state so I think at
>> a minimum we need to take a spinlock in that function. Otherwise it
>> seems to be safe.
> Note that you're holding the crtc lock already, which gives you exclusion
> against concurrent page_flips, mode_sets and property changes. Note also
> that page_flips themselves also only hold the crtc lock, so you can run
> multiple page_flips in parallel on different crtc (iirc vmwgfx has
> multiple crtc, if not this discussion is entirely moot).
>
> tbh I'd be surprised if my patch really breaks something that hasn't been
> a pre-existing issue for a long time. The original commit which added this
> FIXME comment is from 2012. Note also that because it's a hack, you
> already have a pretty a real race with the core drm state keeping, and no
> one seems to have hit that either.
>
> I mean I can dig through vmwgfx code and do the audit, but it'll take a
> few hours and vmwgfx is it's own world, so much harder to understand (for
> me).
>

I'm thinking of the situation when someone would call a cursor_set ioctl
in parallell
for two crtcs at the same time and race writing the position registers?
Note that the device has only a single global cursor.
Admittedly the effects of a race would probably be small, but I'd rather
see it being
properly protected.

>> But I prefer if we can do that as part of the atomic update?
> When does that vmwgfx atomic happen?
> -Daniel

We're targeting 4.12, which means the code that is currently under
testing will need to be sent out for review pretty soon.
It's already in our standalone testing repo at

git://git.freedesktop.org/git/mesa/vmwgfx

but the cursor code hasn't been fixed in that repo yet.

BTW is this blocking some other core drm work you're doing?

Thanks,

/Thomas
Daniel Vetter March 23, 2017, 12:56 p.m. UTC | #7
On Thu, Mar 23, 2017 at 11:32:49AM +0100, Thomas Hellstrom wrote:
> On 03/23/2017 11:10 AM, Daniel Vetter wrote:
> > On Thu, Mar 23, 2017 at 09:35:25AM +0100, Thomas Hellstrom wrote:
> >> Hi, Daniel,
> >>
> >> On 03/23/2017 08:31 AM, Daniel Vetter wrote:
> >>> On Thu, Mar 23, 2017 at 08:28:32AM +0100, Daniel Vetter wrote:
> >>>> On Thu, Mar 23, 2017 at 07:22:31AM +0100, Thomas Hellstrom wrote:
> >>>>> On 03/22/2017 10:50 PM, Daniel Vetter wrote:
> >>>>>> It's been around forever, no one bothered to address the FIXME, so I
> >>>>>> presume it's all fine.
> >>>>>>
> >>>>>> Cc: Sinclair Yeh <syeh@vmware.com>
> >>>>>> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> >>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>>>> NAK. We need to properly address this. Probably as part of the atomic
> >>>>> update.
> >>>> So could someone with vmwgfx understanding explain this? Note that the
> >>>> FIXME was originally added by me years ago, because I wasn't sure (only
> >>>> about 90%) that this is safe, and was essentially pleading for a vmwgfx
> >>>> expert to review this?
> >>>>
> >>>> Since it didn't happen I presume it's not that terribly and probably safe
> >>>> ...
> >>>>
> >>>> I'm still 90% sure that this is correct, but I'd love for a vmwgfx to
> >>>> audit it. Replying with a NAK is kinda not the response I was hoping for
> >>>> (and yes I guess I should have explained what's going on here better, but
> >>>> it's just a git blame of the FIXME comment away).
> >> So the code has been left in place because it works. Altering it now
> >> will create unnecessary merge conflicts with the atomic code, and the
> >> change isn't tested and audited which means we need to drop focus from
> >> what we're doing and audit and test code that isn't going to be used
> >> anyway for not apparent reason? But otoh put in the below context there
> >> indeed is a reason.
> >>
> >> From a quick audit of the existing code it seems like at least
> >> vmw_cursor_update_position is touching global device state so I think at
> >> a minimum we need to take a spinlock in that function. Otherwise it
> >> seems to be safe.
> > Note that you're holding the crtc lock already, which gives you exclusion
> > against concurrent page_flips, mode_sets and property changes. Note also
> > that page_flips themselves also only hold the crtc lock, so you can run
> > multiple page_flips in parallel on different crtc (iirc vmwgfx has
> > multiple crtc, if not this discussion is entirely moot).
> >
> > tbh I'd be surprised if my patch really breaks something that hasn't been
> > a pre-existing issue for a long time. The original commit which added this
> > FIXME comment is from 2012. Note also that because it's a hack, you
> > already have a pretty a real race with the core drm state keeping, and no
> > one seems to have hit that either.
> >
> > I mean I can dig through vmwgfx code and do the audit, but it'll take a
> > few hours and vmwgfx is it's own world, so much harder to understand (for
> > me).
> >
> 
> I'm thinking of the situation when someone would call a cursor_set ioctl
> in parallell
> for two crtcs at the same time and race writing the position registers?
> Note that the device has only a single global cursor.
> Admittedly the effects of a race would probably be small, but I'd rather
> see it being
> properly protected.

Hm, didn't realize you only have 1 cursor for everything together. In that
case you indeed have a problem. Not sure why that didn't come up 4 years
ago with the original patch, would be pretty easy to add a quite mutex in
v2 ... Since read-only global state is perfectly fine, having the crtc
lock gives you a read-only global state lock (for legacy drivers at least,
not for atomic).
>
> >> But I prefer if we can do that as part of the atomic update?
> > When does that vmwgfx atomic happen?
> 
> We're targeting 4.12, which means the code that is currently under
> testing will need to be sent out for review pretty soon.
> It's already in our standalone testing repo at
> 
> git://git.freedesktop.org/git/mesa/vmwgfx

Deadline is in 2 weeks for 4.12 feature work, per the discussion we've had
after the 4.11 merge window fallout with Linus. You pretty much have to
submit the patches now to have a reasonable chance of them landing in
time. Since vmwgfx has traditionally been the odd kms driver out I'd
really like to give the new atomic code at least a quick read-through, to
make sure it's aligned as much as possible with the other 20+ atomic
drivers.

> but the cursor code hasn't been fixed in that repo yet.

Well if you switched to universal planes it's pretty easy to fix with the
acquire ctx and grabbing mode_config.connection_mutex. Without that you
can just add a global cursor mutex (equally few lines) to patch it up.
> 
> BTW is this blocking some other core drm work you're doing?

Just removing lock_crtc and preventing abuse from spreading. Somehow both
tegra and tilcdc starting using it in places it was definitely not meant
for. vmwgfx (with this FIXME here) was the only legit user of this
function. So not high priority really, but something that'd be really nice
to remove from the exported set of functions to prevent future misuse by
new drivers.

Thanks, Daniel
Michel Dänzer March 27, 2017, 3:01 a.m. UTC | #8
On 23/03/17 07:32 PM, Thomas Hellstrom wrote:
> On 03/23/2017 11:10 AM, Daniel Vetter wrote:
>> On Thu, Mar 23, 2017 at 09:35:25AM +0100, Thomas Hellstrom wrote:
>>> On 03/23/2017 08:31 AM, Daniel Vetter wrote:
>>>> On Thu, Mar 23, 2017 at 08:28:32AM +0100, Daniel Vetter wrote:
>>>>> On Thu, Mar 23, 2017 at 07:22:31AM +0100, Thomas Hellstrom wrote:
>>>>>> On 03/22/2017 10:50 PM, Daniel Vetter wrote:
>>>>>>> It's been around forever, no one bothered to address the FIXME, so I
>>>>>>> presume it's all fine.
>>>>>>>
>>>>>>> Cc: Sinclair Yeh <syeh@vmware.com>
>>>>>>> Cc: Thomas Hellstrom <thellstrom@vmware.com>
>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>>> NAK. We need to properly address this. Probably as part of the atomic
>>>>>> update.
>>>>> So could someone with vmwgfx understanding explain this? Note that the
>>>>> FIXME was originally added by me years ago, because I wasn't sure (only
>>>>> about 90%) that this is safe, and was essentially pleading for a vmwgfx
>>>>> expert to review this?
>>>>>
>>>>> Since it didn't happen I presume it's not that terribly and probably safe
>>>>> ...
>>>>>
>>>>> I'm still 90% sure that this is correct, but I'd love for a vmwgfx to
>>>>> audit it. Replying with a NAK is kinda not the response I was hoping for
>>>>> (and yes I guess I should have explained what's going on here better, but
>>>>> it's just a git blame of the FIXME comment away).
>>> So the code has been left in place because it works. Altering it now
>>> will create unnecessary merge conflicts with the atomic code, and the
>>> change isn't tested and audited which means we need to drop focus from
>>> what we're doing and audit and test code that isn't going to be used
>>> anyway for not apparent reason? But otoh put in the below context there
>>> indeed is a reason.
>>>
>>> From a quick audit of the existing code it seems like at least
>>> vmw_cursor_update_position is touching global device state so I think at
>>> a minimum we need to take a spinlock in that function. Otherwise it
>>> seems to be safe.
>> Note that you're holding the crtc lock already, which gives you exclusion
>> against concurrent page_flips, mode_sets and property changes. Note also
>> that page_flips themselves also only hold the crtc lock, so you can run
>> multiple page_flips in parallel on different crtc (iirc vmwgfx has
>> multiple crtc, if not this discussion is entirely moot).
>>
>> tbh I'd be surprised if my patch really breaks something that hasn't been
>> a pre-existing issue for a long time. The original commit which added this
>> FIXME comment is from 2012. Note also that because it's a hack, you
>> already have a pretty a real race with the core drm state keeping, and no
>> one seems to have hit that either.
>>
>> I mean I can dig through vmwgfx code and do the audit, but it'll take a
>> few hours and vmwgfx is it's own world, so much harder to understand (for
>> me).
>>
> 
> I'm thinking of the situation when someone would call a cursor_set ioctl
> in parallell for two crtcs at the same time and race writing the
> position registers?
> Note that the device has only a single global cursor.
> Admittedly the effects of a race would probably be small, but I'd rather
> see it being properly protected.

Indeed, as long as userspace uses cursor positions (and images) on each
CRTC which are consistent with a single cursor in a single framebuffer,
it shouldn't matter in which order they write the registers. And if the
per-CRTC positions aren't consistent like that, locking won't help either.

Strictly speaking, the (virtual) hardware is too limited to support the
legacy KMS cursor API. AFAIR e.g. weston at least used to make use of HW
cursors for other surfaces, not sure that's currently the case though.
Daniel Vetter March 27, 2017, 6:28 a.m. UTC | #9
We discussed this quickly on irc, transcribing.

On Mon, Mar 27, 2017 at 5:01 AM, Michel Dänzer <michel@daenzer.net> wrote:
> Strictly speaking, the (virtual) hardware is too limited to support the
> legacy KMS cursor API. AFAIR e.g. weston at least used to make use of HW
> cursors for other surfaces, not sure that's currently the case though.

That was disabled again because of lack of atomic (together with all
overlay support if your driver isn't atomic). But atomic/universal
planes allows us to at least model vmwgfx correctly. For each crtc
we'd have one primary plane, but only one global cursor plane that we
attach to the cursor slot of each crtc. Then universal/atomic aware
userspace could realize that there's only 1 cursor plane and make sure
it's not over-used.
-Daniel
Thomas Hellstrom March 27, 2017, 8:31 a.m. UTC | #10
On 03/27/2017 08:28 AM, Daniel Vetter wrote:
> We discussed this quickly on irc, transcribing.
>
> On Mon, Mar 27, 2017 at 5:01 AM, Michel Dänzer <michel@daenzer.net> wrote:
>> Strictly speaking, the (virtual) hardware is too limited to support the
>> legacy KMS cursor API. AFAIR e.g. weston at least used to make use of HW
>> cursors for other surfaces, not sure that's currently the case though.
> That was disabled again because of lack of atomic (together with all
> overlay support if your driver isn't atomic). But atomic/universal
> planes allows us to at least model vmwgfx correctly. For each crtc
> we'd have one primary plane, but only one global cursor plane that we
> attach to the cursor slot of each crtc. Then universal/atomic aware
> userspace could realize that there's only 1 cursor plane and make sure
> it's not over-used.

That sounds encouraging. In practice we haven't really seen any problems
because most users use vmware tools,
which places the outputs in such a way that the cursor location visually
coincides for all crtcs.
The problem starts if someone would override tools and try to clone the
contents across crtcs.
The vmware xorg driver has some logic to try to detect such situations
and fall back to software cursors, and possibly we might have to, at
some point, implement software cursor composition in the kernel, but for
now we live with the potential possibilty that users will see the cursor
jumping across the screens..

/Thomas



> -Daniel
Daniel Vetter March 29, 2017, 8 a.m. UTC | #11
On Mon, Mar 27, 2017 at 10:31:51AM +0200, Thomas Hellstrom wrote:
> On 03/27/2017 08:28 AM, Daniel Vetter wrote:
> > We discussed this quickly on irc, transcribing.
> >
> > On Mon, Mar 27, 2017 at 5:01 AM, Michel Dänzer <michel@daenzer.net> wrote:
> >> Strictly speaking, the (virtual) hardware is too limited to support the
> >> legacy KMS cursor API. AFAIR e.g. weston at least used to make use of HW
> >> cursors for other surfaces, not sure that's currently the case though.
> > That was disabled again because of lack of atomic (together with all
> > overlay support if your driver isn't atomic). But atomic/universal
> > planes allows us to at least model vmwgfx correctly. For each crtc
> > we'd have one primary plane, but only one global cursor plane that we
> > attach to the cursor slot of each crtc. Then universal/atomic aware
> > userspace could realize that there's only 1 cursor plane and make sure
> > it's not over-used.
> 
> That sounds encouraging. In practice we haven't really seen any problems
> because most users use vmware tools,
> which places the outputs in such a way that the cursor location visually
> coincides for all crtcs.
> The problem starts if someone would override tools and try to clone the
> contents across crtcs.
> The vmware xorg driver has some logic to try to detect such situations
> and fall back to software cursors, and possibly we might have to, at
> some point, implement software cursor composition in the kernel, but for
> now we live with the potential possibilty that users will see the cursor
> jumping across the screens..

Ok, I've pulled in the series, except this patch plus the few cleanups
that depend upon it. I'll respin this as soon as vmwgfx atomic has landed,
with either a local mutex (if you still have more sw cursor planes than
real ones) or no changes (if your universal cursor code is fixed to only
have one cursor for the entire device instance).

Thanks, Daniel
Thomas Hellstrom March 29, 2017, 8:04 a.m. UTC | #12
On 03/29/2017 10:00 AM, Daniel Vetter wrote:
> On Mon, Mar 27, 2017 at 10:31:51AM +0200, Thomas Hellstrom wrote:
>> On 03/27/2017 08:28 AM, Daniel Vetter wrote:
>>> We discussed this quickly on irc, transcribing.
>>>
>>> On Mon, Mar 27, 2017 at 5:01 AM, Michel Dänzer <michel@daenzer.net> wrote:
>>>> Strictly speaking, the (virtual) hardware is too limited to support the
>>>> legacy KMS cursor API. AFAIR e.g. weston at least used to make use of HW
>>>> cursors for other surfaces, not sure that's currently the case though.
>>> That was disabled again because of lack of atomic (together with all
>>> overlay support if your driver isn't atomic). But atomic/universal
>>> planes allows us to at least model vmwgfx correctly. For each crtc
>>> we'd have one primary plane, but only one global cursor plane that we
>>> attach to the cursor slot of each crtc. Then universal/atomic aware
>>> userspace could realize that there's only 1 cursor plane and make sure
>>> it's not over-used.
>> That sounds encouraging. In practice we haven't really seen any problems
>> because most users use vmware tools,
>> which places the outputs in such a way that the cursor location visually
>> coincides for all crtcs.
>> The problem starts if someone would override tools and try to clone the
>> contents across crtcs.
>> The vmware xorg driver has some logic to try to detect such situations
>> and fall back to software cursors, and possibly we might have to, at
>> some point, implement software cursor composition in the kernel, but for
>> now we live with the potential possibilty that users will see the cursor
>> jumping across the screens..
> Ok, I've pulled in the series, except this patch plus the few cleanups
> that depend upon it. I'll respin this as soon as vmwgfx atomic has landed,
> with either a local mutex (if you still have more sw cursor planes than
> real ones) or no changes (if your universal cursor code is fixed to only
> have one cursor for the entire device instance).
>
> Thanks, Daniel

Thanks,

In the patch series we have added a local spinlock (cursor_lock) to
protect from
concurrent register access.

/Thomas
diff mbox

Patch

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index d492d57d5309..424b3fc57203 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -148,15 +148,6 @@  int vmw_du_crtc_cursor_set2(struct drm_crtc *crtc, struct drm_file *file_priv,
 	s32 hotspot_x, hotspot_y;
 	int ret;
 
-	/*
-	 * FIXME: Unclear whether there's any global state touched by the
-	 * cursor_set function, especially vmw_cursor_update_position looks
-	 * suspicious. For now take the easy route and reacquire all locks. We
-	 * can do this since the caller in the drm core doesn't check anything
-	 * which is protected by any looks.
-	 */
-	drm_modeset_unlock_crtc(crtc);
-	drm_modeset_lock_all(dev_priv->dev);
 	hotspot_x = hot_x + du->hotspot_x;
 	hotspot_y = hot_y + du->hotspot_y;
 
@@ -224,9 +215,6 @@  int vmw_du_crtc_cursor_set2(struct drm_crtc *crtc, struct drm_file *file_priv,
 	}
 
 out:
-	drm_modeset_unlock_all(dev_priv->dev);
-	drm_modeset_lock_crtc(crtc, crtc->cursor);
-
 	return ret;
 }
 
@@ -239,25 +227,12 @@  int vmw_du_crtc_cursor_move(struct drm_crtc *crtc, int x, int y)
 	du->cursor_x = x + du->set_gui_x;
 	du->cursor_y = y + du->set_gui_y;
 
-	/*
-	 * FIXME: Unclear whether there's any global state touched by the
-	 * cursor_set function, especially vmw_cursor_update_position looks
-	 * suspicious. For now take the easy route and reacquire all locks. We
-	 * can do this since the caller in the drm core doesn't check anything
-	 * which is protected by any looks.
-	 */
-	drm_modeset_unlock_crtc(crtc);
-	drm_modeset_lock_all(dev_priv->dev);
-
 	vmw_cursor_update_position(dev_priv, shown,
 				   du->cursor_x + du->hotspot_x +
 				   du->core_hotspot_x,
 				   du->cursor_y + du->hotspot_y +
 				   du->core_hotspot_y);
 
-	drm_modeset_unlock_all(dev_priv->dev);
-	drm_modeset_lock_crtc(crtc, crtc->cursor);
-
 	return 0;
 }