diff mbox series

[v3,1/4] fbdev: Prevent possible use-after-free in fb_release()

Message ID 20220505220413.365977-1-javierm@redhat.com (mailing list archive)
State Not Applicable
Headers show
Series fbdev: Fix use-after-free caused by wrong fb_info cleanup in drivers | expand

Commit Message

Javier Martinez Canillas May 5, 2022, 10:04 p.m. UTC
From: Daniel Vetter <daniel.vetter@ffwll.ch>

Most fbdev drivers have issues with the fb_info lifetime, because call to
framebuffer_release() from their driver's .remove callback, rather than
doing from fbops.fb_destroy callback.

Doing that will destroy the fb_info too early, while references to it may
still exist, leading to a use-after-free error.

To prevent this, check the fb_info reference counter when attempting to
kfree the data structure in framebuffer_release(). That will leak it but
at least will prevent the mentioned error.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
---

(no changes since v1)

 drivers/video/fbdev/core/fbsysfs.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Andrzej Hajda May 9, 2022, 2:56 p.m. UTC | #1
On 06.05.2022 00:04, Javier Martinez Canillas wrote:
> From: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Most fbdev drivers have issues with the fb_info lifetime, because call to
> framebuffer_release() from their driver's .remove callback, rather than
> doing from fbops.fb_destroy callback.
> 
> Doing that will destroy the fb_info too early, while references to it may
> still exist, leading to a use-after-free error.
> 
> To prevent this, check the fb_info reference counter when attempting to
> kfree the data structure in framebuffer_release(). That will leak it but
> at least will prevent the mentioned error.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
> 
> (no changes since v1)
> 
>   drivers/video/fbdev/core/fbsysfs.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c
> index 8c1ee9ecec3d..c2a60b187467 100644
> --- a/drivers/video/fbdev/core/fbsysfs.c
> +++ b/drivers/video/fbdev/core/fbsysfs.c
> @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info)
>   {
>   	if (!info)
>   		return;
> +
> +	if (WARN_ON(refcount_read(&info->count)))
> +		return;
> +

Regarding drm:
What about drm_fb_helper_fini? It calls also framebuffer_release and is 
called often from _remove paths (checked intel/radeon/nouveau). I guess 
it should be fixed as well. Do you plan to fix it?


Regarding fb drivers, just for stats:
git grep -p framebuffer_release | grep _remove | wc -l
Suggests there is at least 70 incorrect users of this :)

Regards
Andrzej
Javier Martinez Canillas May 9, 2022, 3:30 p.m. UTC | #2
Hello Andrzej,

On 5/9/22 16:56, Andrzej Hajda wrote:
> On 06.05.2022 00:04, Javier Martinez Canillas wrote:
>> From: Daniel Vetter <daniel.vetter@ffwll.ch>
>>
>> Most fbdev drivers have issues with the fb_info lifetime, because call to
>> framebuffer_release() from their driver's .remove callback, rather than
>> doing from fbops.fb_destroy callback.
>>
>> Doing that will destroy the fb_info too early, while references to it may
>> still exist, leading to a use-after-free error.
>>
>> To prevent this, check the fb_info reference counter when attempting to
>> kfree the data structure in framebuffer_release(). That will leak it but
>> at least will prevent the mentioned error.
>>
>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
>> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
>> ---
>>
>> (no changes since v1)
>>
>>   drivers/video/fbdev/core/fbsysfs.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c
>> index 8c1ee9ecec3d..c2a60b187467 100644
>> --- a/drivers/video/fbdev/core/fbsysfs.c
>> +++ b/drivers/video/fbdev/core/fbsysfs.c
>> @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info)
>>   {
>>   	if (!info)
>>   		return;
>> +
>> +	if (WARN_ON(refcount_read(&info->count)))
>> +		return;
>> +
> 
> Regarding drm:
> What about drm_fb_helper_fini? It calls also framebuffer_release and is 
> called often from _remove paths (checked intel/radeon/nouveau). I guess 
> it should be fixed as well. Do you plan to fix it?
>

I think you are correct. Maybe we need something like the following?

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index d265a73313c9..b09598f7af28 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
        if (info) {
                if (info->cmap.len)
                        fb_dealloc_cmap(&info->cmap);
-               framebuffer_release(info);
        }
        fb_helper->fbdev = NULL;
 
@@ -2112,6 +2111,7 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper)
 static void drm_fbdev_fb_destroy(struct fb_info *info)
 {
        drm_fbdev_release(info->par);
+       framebuffer_release(info);
 }
 
 static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)

> 
> Regarding fb drivers, just for stats:
> git grep -p framebuffer_release | grep _remove | wc -l
> Suggests there is at least 70 incorrect users of this :)
>

Yes, Daniel already mentioned that most of them get this wrong but I was
mostly interested in {simple,efi,vesa}fb since are used with "nomodeset".

But given that I only touched those tree and still managed to introduce
a regression, I won't attempt to fix the others.
Andrzej Hajda May 9, 2022, 3:51 p.m. UTC | #3
On 09.05.2022 17:30, Javier Martinez Canillas wrote:
> Hello Andrzej,
>
> On 5/9/22 16:56, Andrzej Hajda wrote:
>> On 06.05.2022 00:04, Javier Martinez Canillas wrote:
>>> From: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>
>>> Most fbdev drivers have issues with the fb_info lifetime, because call to
>>> framebuffer_release() from their driver's .remove callback, rather than
>>> doing from fbops.fb_destroy callback.
>>>
>>> Doing that will destroy the fb_info too early, while references to it may
>>> still exist, leading to a use-after-free error.
>>>
>>> To prevent this, check the fb_info reference counter when attempting to
>>> kfree the data structure in framebuffer_release(). That will leak it but
>>> at least will prevent the mentioned error.
>>>
>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
>>> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
>>> ---
>>>
>>> (no changes since v1)
>>>
>>>    drivers/video/fbdev/core/fbsysfs.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c
>>> index 8c1ee9ecec3d..c2a60b187467 100644
>>> --- a/drivers/video/fbdev/core/fbsysfs.c
>>> +++ b/drivers/video/fbdev/core/fbsysfs.c
>>> @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info)
>>>    {
>>>    	if (!info)
>>>    		return;
>>> +
>>> +	if (WARN_ON(refcount_read(&info->count)))
>>> +		return;
>>> +
>> Regarding drm:
>> What about drm_fb_helper_fini? It calls also framebuffer_release and is
>> called often from _remove paths (checked intel/radeon/nouveau). I guess
>> it should be fixed as well. Do you plan to fix it?
>>
> I think you are correct. Maybe we need something like the following?
>
> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> index d265a73313c9..b09598f7af28 100644
> --- a/drivers/gpu/drm/drm_fb_helper.c
> +++ b/drivers/gpu/drm/drm_fb_helper.c
> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
>          if (info) {
>                  if (info->cmap.len)
>                          fb_dealloc_cmap(&info->cmap);
> -               framebuffer_release(info);

What about cmap? I am not an fb expert, but IMO cmap can be accessed 
from userspace as well.

Regards
Andrzej

>          }
>          fb_helper->fbdev = NULL;
>   
> @@ -2112,6 +2111,7 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper)
>   static void drm_fbdev_fb_destroy(struct fb_info *info)
>   {
>          drm_fbdev_release(info->par);
> +       framebuffer_release(info);
>   }
>   
>   static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
>
>> Regarding fb drivers, just for stats:
>> git grep -p framebuffer_release | grep _remove | wc -l
>> Suggests there is at least 70 incorrect users of this :)
>>
> Yes, Daniel already mentioned that most of them get this wrong but I was
> mostly interested in {simple,efi,vesa}fb since are used with "nomodeset".
>
> But given that I only touched those tree and still managed to introduce
> a regression, I won't attempt to fix the others.
>
Javier Martinez Canillas May 9, 2022, 4:33 p.m. UTC | #4
On 5/9/22 17:51, Andrzej Hajda wrote:

[snip]

>>>> +
>>> Regarding drm:
>>> What about drm_fb_helper_fini? It calls also framebuffer_release and is
>>> called often from _remove paths (checked intel/radeon/nouveau). I guess
>>> it should be fixed as well. Do you plan to fix it?
>>>
>> I think you are correct. Maybe we need something like the following?
>>
>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
>> index d265a73313c9..b09598f7af28 100644
>> --- a/drivers/gpu/drm/drm_fb_helper.c
>> +++ b/drivers/gpu/drm/drm_fb_helper.c
>> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
>>          if (info) {
>>                  if (info->cmap.len)
>>                          fb_dealloc_cmap(&info->cmap);
>> -               framebuffer_release(info);
> 
> What about cmap? I am not an fb expert, but IMO cmap can be accessed 
> from userspace as well.
> 

I actually thought about the same but then remembered what Daniel said in [0]
(AFAIU at least) that these should be done in .remove() so the current code
looks like matches that and only framebuffer_release() should be moved.

For vesafb a previous patch proposed to also move a release_region() call to
.fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].

But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that
is the correct thing to do.

[0]: https://www.spinics.net/lists/dri-devel/msg346257.html
[1]: https://www.spinics.net/lists/dri-devel/msg346261.html
Thomas Zimmermann May 9, 2022, 6:12 p.m. UTC | #5
Hi Javier

Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas:
> On 5/9/22 17:51, Andrzej Hajda wrote:
> 
> [snip]
> 
>>>>> +
>>>> Regarding drm:
>>>> What about drm_fb_helper_fini? It calls also framebuffer_release and is
>>>> called often from _remove paths (checked intel/radeon/nouveau). I guess
>>>> it should be fixed as well. Do you plan to fix it?
>>>>
>>> I think you are correct. Maybe we need something like the following?
>>>
>>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
>>> index d265a73313c9..b09598f7af28 100644
>>> --- a/drivers/gpu/drm/drm_fb_helper.c
>>> +++ b/drivers/gpu/drm/drm_fb_helper.c
>>> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
>>>           if (info) {
>>>                   if (info->cmap.len)
>>>                           fb_dealloc_cmap(&info->cmap);
>>> -               framebuffer_release(info);
>>
>> What about cmap? I am not an fb expert, but IMO cmap can be accessed
>> from userspace as well.
>>
> 
> I actually thought about the same but then remembered what Daniel said in [0]
> (AFAIU at least) that these should be done in .remove() so the current code
> looks like matches that and only framebuffer_release() should be moved.
> 
> For vesafb a previous patch proposed to also move a release_region() call to
> .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].
> 
> But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that
> is the correct thing to do.

The cmap data structure is software state that can be accessed via icotl 
as long as the devfile is open. Drivers update the hardware from it. See 
[1].  Moving that cleanup into fb_destroy seems appropriate to me.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v5.17.6/source/drivers/video/fbdev/core/fbcmap.c#L231

> 
> [0]: https://www.spinics.net/lists/dri-devel/msg346257.html
> [1]: https://www.spinics.net/lists/dri-devel/msg346261.html
>
Thomas Zimmermann May 9, 2022, 6:32 p.m. UTC | #6
Hi

Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas:
> On 5/9/22 17:51, Andrzej Hajda wrote:
> 
> [snip]
> 
>>>>> +
>>>> Regarding drm:
>>>> What about drm_fb_helper_fini? It calls also framebuffer_release and is
>>>> called often from _remove paths (checked intel/radeon/nouveau). I guess
>>>> it should be fixed as well. Do you plan to fix it?
>>>>
>>> I think you are correct. Maybe we need something like the following?
>>>
>>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
>>> index d265a73313c9..b09598f7af28 100644
>>> --- a/drivers/gpu/drm/drm_fb_helper.c
>>> +++ b/drivers/gpu/drm/drm_fb_helper.c
>>> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
>>>           if (info) {
>>>                   if (info->cmap.len)
>>>                           fb_dealloc_cmap(&info->cmap);
>>> -               framebuffer_release(info);

After reviewing that code,  drm_fb_helper_fini() appears to be called 
from .fb_destroy (see drm_fbdev_release).  The code is hard to follow 
though.  If there another way of releasing the framebuffer here?

Best regards
Thomas

>>
>> What about cmap? I am not an fb expert, but IMO cmap can be accessed
>> from userspace as well.
>>
> 
> I actually thought about the same but then remembered what Daniel said in [0]
> (AFAIU at least) that these should be done in .remove() so the current code
> looks like matches that and only framebuffer_release() should be moved.
> 
> For vesafb a previous patch proposed to also move a release_region() call to
> .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].
> 
> But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that
> is the correct thing to do.
> 
> [0]: https://www.spinics.net/lists/dri-devel/msg346257.html
> [1]: https://www.spinics.net/lists/dri-devel/msg346261.html
>
Javier Martinez Canillas May 9, 2022, 8 p.m. UTC | #7
Hello Thomas,

On 5/9/22 20:32, Thomas Zimmermann wrote:
> Hi
> 
> Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas:
>> On 5/9/22 17:51, Andrzej Hajda wrote:
>>
>> [snip]
>>
>>>>>> +
>>>>> Regarding drm:
>>>>> What about drm_fb_helper_fini? It calls also framebuffer_release and is
>>>>> called often from _remove paths (checked intel/radeon/nouveau). I guess
>>>>> it should be fixed as well. Do you plan to fix it?
>>>>>
>>>> I think you are correct. Maybe we need something like the following?
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
>>>> index d265a73313c9..b09598f7af28 100644
>>>> --- a/drivers/gpu/drm/drm_fb_helper.c
>>>> +++ b/drivers/gpu/drm/drm_fb_helper.c
>>>> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
>>>>           if (info) {
>>>>                   if (info->cmap.len)
>>>>                           fb_dealloc_cmap(&info->cmap);
>>>> -               framebuffer_release(info);
> 
> After reviewing that code,  drm_fb_helper_fini() appears to be called 
> from .fb_destroy (see drm_fbdev_release).  The code is hard to follow 
> though.  If there another way of releasing the framebuffer here?
> 

Andrzej mentioned intel/radeon/nouveau as example, I only looked at i915
and the call chain is the following as far as I can tell:

struct pci_driver i915_pci_driver = {
...
        .remove = i915_pci_remove,
...
};


i915_driver_remove
  intel_modeset_driver_remove_noirq
    intel_fbdev_fini
      intel_fbdev_destroy
        drm_fb_helper_fini
          framebuffer_release
              
So my underdestanding is that if a program has the emulated fbdev device
opened and the i915 module is removed, then a use-after-free would be
triggered on drm_fbdev_fb_destroy() once the program closes the device:

drm_fbdev_fb_destroy
  drm_fbdev_release(info->par); <-- info was already freed on .remove
Javier Martinez Canillas May 9, 2022, 8:03 p.m. UTC | #8
On 5/9/22 20:12, Thomas Zimmermann wrote:

[snip]

>> I actually thought about the same but then remembered what Daniel said in [0]
>> (AFAIU at least) that these should be done in .remove() so the current code
>> looks like matches that and only framebuffer_release() should be moved.
>>
>> For vesafb a previous patch proposed to also move a release_region() call to
>> .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].
>>
>> But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that
>> is the correct thing to do.
> 
> The cmap data structure is software state that can be accessed via icotl 
> as long as the devfile is open. Drivers update the hardware from it. See 
> [1].  Moving that cleanup into fb_destroy seems appropriate to me.
> 

I see, that makes sense. Then something like the following instead?

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index d265a73313c9..ce0d89c49e42 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
        cancel_work_sync(&fb_helper->resume_work);
        cancel_work_sync(&fb_helper->damage_work);
 
-       info = fb_helper->fbdev;
-       if (info) {
-               if (info->cmap.len)
-                       fb_dealloc_cmap(&info->cmap);
-               framebuffer_release(info);
-       }
        fb_helper->fbdev = NULL;
 
        mutex_lock(&kernel_fb_helper_lock);
@@ -2111,7 +2105,11 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper)
  */
 static void drm_fbdev_fb_destroy(struct fb_info *info)
 {
+       if (info->cmap.len)
+               fb_dealloc_cmap(&info->cmap);
+
        drm_fbdev_release(info->par);
+       framebuffer_release(info);
 }
 
 static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
Andrzej Hajda May 9, 2022, 10:22 p.m. UTC | #9
On 09.05.2022 22:03, Javier Martinez Canillas wrote:
> On 5/9/22 20:12, Thomas Zimmermann wrote:
>
> [snip]
>
>>> I actually thought about the same but then remembered what Daniel said in [0]
>>> (AFAIU at least) that these should be done in .remove() so the current code
>>> looks like matches that and only framebuffer_release() should be moved.
>>>
>>> For vesafb a previous patch proposed to also move a release_region() call to
>>> .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].
>>>
>>> But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that
>>> is the correct thing to do.
>> The cmap data structure is software state that can be accessed via icotl
>> as long as the devfile is open. Drivers update the hardware from it. See
>> [1].  Moving that cleanup into fb_destroy seems appropriate to me.
>>
> I see, that makes sense. Then something like the following instead?
>
> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> index d265a73313c9..ce0d89c49e42 100644
> --- a/drivers/gpu/drm/drm_fb_helper.c
> +++ b/drivers/gpu/drm/drm_fb_helper.c
> @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
>          cancel_work_sync(&fb_helper->resume_work);
>          cancel_work_sync(&fb_helper->damage_work);
>   
> -       info = fb_helper->fbdev;
> -       if (info) {
> -               if (info->cmap.len)
> -                       fb_dealloc_cmap(&info->cmap);
> -               framebuffer_release(info);
> -       }
>          fb_helper->fbdev = NULL;
>   
>          mutex_lock(&kernel_fb_helper_lock);
> @@ -2111,7 +2105,11 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper)
>    */
>   static void drm_fbdev_fb_destroy(struct fb_info *info)
>   {
> +       if (info->cmap.len)
> +               fb_dealloc_cmap(&info->cmap);
> +
>          drm_fbdev_release(info->par);
> +       framebuffer_release(info);

I would put drm_fbdev_release at the beginning - it cancels workers 
which could expect cmap to be still valid.

Regards
Andrzej


>   }
>   
>   static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
>
Javier Martinez Canillas May 9, 2022, 10:42 p.m. UTC | #10
On 5/10/22 00:22, Andrzej Hajda wrote:

[snip]

>>   static void drm_fbdev_fb_destroy(struct fb_info *info)
>>   {
>> +       if (info->cmap.len)
>> +               fb_dealloc_cmap(&info->cmap);
>> +
>>          drm_fbdev_release(info->par);
>> +       framebuffer_release(info);
> 
> I would put drm_fbdev_release at the beginning - it cancels workers 
> which could expect cmap to be still valid.
> 

Indeed, you are correct again. [0] is the final version of the patch I've
but don't have an i915 test machine to give it a try. I'll test tomorrow
on my test systems to verify that it doesn't cause any regressions since
with other DRM drivers.

I think that besides this patch, drivers shouldn't need to call to the
drm_fb_helper_fini() function directly. Since that would be called during
drm_fbdev_fb_destroy() anyways.

We should probably remove that call in all drivers and make this helper
function static and just private to drm_fb_helper functions.

Or am I missing something here ?

[0]:
From 5170cafcf2936da8f1c53231e3baa7d7a2b16c61 Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javierm@redhat.com>
Date: Tue May 10 00:39:55 2022 +0200
Subject: [RFT PATCH] drm/fb-helper: Don't deallocate fb colormap and free fb info
 too early

Currently these are done in drm_fb_helper_fini() but this helper is called
by drivers in their .remove callback, which could lead to a use-after-free
if a process has opened the emulated fbdev node while a driver is removed.

For example, in i915 driver the call chain during remove is the following:

struct pci_driver i915_pci_driver = {
...
        .remove = i915_pci_remove,
...
};

i915_pci_remove
  i915_driver_remove
    intel_modeset_driver_remove_noirq
      intel_fbdev_fini
        intel_fbdev_destroy
          drm_fb_helper_fini
            framebuffer_release

Later the process will close the fbdev node file descriptor leading to the
mentioned use-after-free bug in drm_fbdev_fb_destroy(), due the following:

drm_fbdev_fb_destroy
  drm_fbdev_release(info->par); <-- info was already freed on .remove

To prevent that, let's move the framebuffer_release() call to the end of
the drm_fbdev_fb_destroy() function.

Also, the call to fb_dealloc_cmap() in drm_fb_helper_fini() is too early
and is more correct to do it in drm_fbdev_fb_destroy() as well. After a
call to drm_fbdev_release() has been made.

Reported-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---
 drivers/gpu/drm/drm_fb_helper.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index d265a73313c9..7288fbd26bcc 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
 	cancel_work_sync(&fb_helper->resume_work);
 	cancel_work_sync(&fb_helper->damage_work);
 
-	info = fb_helper->fbdev;
-	if (info) {
-		if (info->cmap.len)
-			fb_dealloc_cmap(&info->cmap);
-		framebuffer_release(info);
-	}
 	fb_helper->fbdev = NULL;
 
 	mutex_lock(&kernel_fb_helper_lock);
@@ -2112,6 +2106,9 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper)
 static void drm_fbdev_fb_destroy(struct fb_info *info)
 {
 	drm_fbdev_release(info->par);
+	if (info->cmap.len)
+		fb_dealloc_cmap(&info->cmap);
+	framebuffer_release(info);
 }
 
 static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
Andrzej Hajda May 10, 2022, 7:19 a.m. UTC | #11
On 10.05.2022 00:42, Javier Martinez Canillas wrote:
> On 5/10/22 00:22, Andrzej Hajda wrote:
>
> [snip]
>
>>>    static void drm_fbdev_fb_destroy(struct fb_info *info)
>>>    {
>>> +       if (info->cmap.len)
>>> +               fb_dealloc_cmap(&info->cmap);
>>> +
>>>           drm_fbdev_release(info->par);
>>> +       framebuffer_release(info);
>> I would put drm_fbdev_release at the beginning - it cancels workers
>> which could expect cmap to be still valid.
>>
> Indeed, you are correct again. [0] is the final version of the patch I've
> but don't have an i915 test machine to give it a try. I'll test tomorrow
> on my test systems to verify that it doesn't cause any regressions since
> with other DRM drivers.
>
> I think that besides this patch, drivers shouldn't need to call to the
> drm_fb_helper_fini() function directly. Since that would be called during
> drm_fbdev_fb_destroy() anyways.
>
> We should probably remove that call in all drivers and make this helper
> function static and just private to drm_fb_helper functions.
>
> Or am I missing something here ?

This is question for experts :)
I do not know what are user API/ABI expectations regarding removal of 
fbdev driver, I wonder if they are documented somewhere :)
Apparently we have some process of 'zombification'  here - we need to 
remove the driver without waiting for userspace closing framebuffer(???) 
(to unbind ops-es and remove references to driver related things), but 
we need to leave some structures to fool userspace, 'info' seems to be 
one of them.
So I guess there should be something called on driver's _remove path, 
and sth on destroy path.

Regards
Andrzej

>
> [0]:
>  From 5170cafcf2936da8f1c53231e3baa7d7a2b16c61 Mon Sep 17 00:00:00 2001
> From: Javier Martinez Canillas <javierm@redhat.com>
> Date: Tue May 10 00:39:55 2022 +0200
> Subject: [RFT PATCH] drm/fb-helper: Don't deallocate fb colormap and free fb info
>   too early
>
> Currently these are done in drm_fb_helper_fini() but this helper is called
> by drivers in their .remove callback, which could lead to a use-after-free
> if a process has opened the emulated fbdev node while a driver is removed.
>
> For example, in i915 driver the call chain during remove is the following:
>
> struct pci_driver i915_pci_driver = {
> ...
>          .remove = i915_pci_remove,
> ...
> };
>
> i915_pci_remove
>    i915_driver_remove
>      intel_modeset_driver_remove_noirq
>        intel_fbdev_fini
>          intel_fbdev_destroy
>            drm_fb_helper_fini
>              framebuffer_release
>
> Later the process will close the fbdev node file descriptor leading to the
> mentioned use-after-free bug in drm_fbdev_fb_destroy(), due the following:
>
> drm_fbdev_fb_destroy
>    drm_fbdev_release(info->par); <-- info was already freed on .remove
>
> To prevent that, let's move the framebuffer_release() call to the end of
> the drm_fbdev_fb_destroy() function.
>
> Also, the call to fb_dealloc_cmap() in drm_fb_helper_fini() is too early
> and is more correct to do it in drm_fbdev_fb_destroy() as well. After a
> call to drm_fbdev_release() has been made.
>
> Reported-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> ---
>   drivers/gpu/drm/drm_fb_helper.c | 9 +++------
>   1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> index d265a73313c9..7288fbd26bcc 100644
> --- a/drivers/gpu/drm/drm_fb_helper.c
> +++ b/drivers/gpu/drm/drm_fb_helper.c
> @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
>   	cancel_work_sync(&fb_helper->resume_work);
>   	cancel_work_sync(&fb_helper->damage_work);
>   
> -	info = fb_helper->fbdev;
> -	if (info) {
> -		if (info->cmap.len)
> -			fb_dealloc_cmap(&info->cmap);
> -		framebuffer_release(info);
> -	}
>   	fb_helper->fbdev = NULL;
>   
>   	mutex_lock(&kernel_fb_helper_lock);
> @@ -2112,6 +2106,9 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper)
>   static void drm_fbdev_fb_destroy(struct fb_info *info)
>   {
>   	drm_fbdev_release(info->par);
> +	if (info->cmap.len)
> +		fb_dealloc_cmap(&info->cmap);
> +	framebuffer_release(info);
>   }
>   
>   static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
Javier Martinez Canillas May 10, 2022, 7:50 a.m. UTC | #12
On 5/10/22 09:19, Andrzej Hajda wrote:
> 
> 
> On 10.05.2022 00:42, Javier Martinez Canillas wrote:
>> On 5/10/22 00:22, Andrzej Hajda wrote:
>>
>> [snip]
>>
>>>>    static void drm_fbdev_fb_destroy(struct fb_info *info)
>>>>    {
>>>> +       if (info->cmap.len)
>>>> +               fb_dealloc_cmap(&info->cmap);
>>>> +
>>>>           drm_fbdev_release(info->par);
>>>> +       framebuffer_release(info);
>>> I would put drm_fbdev_release at the beginning - it cancels workers
>>> which could expect cmap to be still valid.
>>>
>> Indeed, you are correct again. [0] is the final version of the patch I've
>> but don't have an i915 test machine to give it a try. I'll test tomorrow
>> on my test systems to verify that it doesn't cause any regressions since
>> with other DRM drivers.
>>
>> I think that besides this patch, drivers shouldn't need to call to the
>> drm_fb_helper_fini() function directly. Since that would be called during
>> drm_fbdev_fb_destroy() anyways.
>>
>> We should probably remove that call in all drivers and make this helper
>> function static and just private to drm_fb_helper functions.
>>
>> Or am I missing something here ?
> 
> This is question for experts :)

Fair. I'm definitely not one of them :)

> I do not know what are user API/ABI expectations regarding removal of 
> fbdev driver, I wonder if they are documented somewhere :)

I don't know. At least I haven't found them.

> Apparently we have some process of 'zombification'  here - we need to 
> remove the driver without waiting for userspace closing framebuffer(???) 
> (to unbind ops-es and remove references to driver related things), but 
> we need to leave some structures to fool userspace, 'info' seems to be 
> one of them.

That's correct, yes. I think that any driver that provides a .mmap file
operation would have the same issue. But drivers keep an internal state
and just return -ENODEV or whatever on read/write/close after a removal.

The fbdev subsystem is different though since as you said it, the fbdev
core unconditionally calls to the driver .fb_release() callback with a
struct fb_info reference as argument.

I tried to prevent that with commit aafa025c76dc ("fbdev: Make fb_release()
return -ENODEV if fbdev was unregistered") but Daniel pointed out that
is was wrong since could leak memory allocated and was expected to be
freed on release.

That's why I instead fixed the issue in the fbdev drivers and just added
a warn on fb_release(), that is $SUBJECT.

> So I guess there should be something called on driver's _remove path, 
> and sth on destroy path.
>

That was my question actually, do we need something to be called in the
destroy path ? Since that could just be internal to the DRM fb helpers.

In other words, drivers should only care about setting a generic fbdev
by calling drm_fbdev_generic_setup(), and then do any HW cleanup in the
removal path, but let the fb helpers to handle the SW cleanup in destroy.
Thomas Zimmermann May 10, 2022, 8:04 a.m. UTC | #13
Hi

Am 10.05.22 um 00:42 schrieb Javier Martinez Canillas:
> On 5/10/22 00:22, Andrzej Hajda wrote:
> 
> [snip]
> 
>>>    static void drm_fbdev_fb_destroy(struct fb_info *info)
>>>    {
>>> +       if (info->cmap.len)
>>> +               fb_dealloc_cmap(&info->cmap);
>>> +
>>>           drm_fbdev_release(info->par);
>>> +       framebuffer_release(info);
>>
>> I would put drm_fbdev_release at the beginning - it cancels workers
>> which could expect cmap to be still valid.
>>
> 
> Indeed, you are correct again. [0] is the final version of the patch I've
> but don't have an i915 test machine to give it a try. I'll test tomorrow
> on my test systems to verify that it doesn't cause any regressions since
> with other DRM drivers.

You have to go through all DRM drivers that call drm_fb_helper_fini() 
and make sure that they free fb_info. For example armada appears to be 
leaking now. [1]

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v5.17.6/source/drivers/gpu/drm/armada/armada_fbdev.c#L152

> 
> I think that besides this patch, drivers shouldn't need to call to the
> drm_fb_helper_fini() function directly. Since that would be called during
> drm_fbdev_fb_destroy() anyways.
> 
> We should probably remove that call in all drivers and make this helper
> function static and just private to drm_fb_helper functions.
> 
> Or am I missing something here ?
> 
> [0]:
>  From 5170cafcf2936da8f1c53231e3baa7d7a2b16c61 Mon Sep 17 00:00:00 2001
> From: Javier Martinez Canillas <javierm@redhat.com>
> Date: Tue May 10 00:39:55 2022 +0200
> Subject: [RFT PATCH] drm/fb-helper: Don't deallocate fb colormap and free fb info
>   too early
> 
> Currently these are done in drm_fb_helper_fini() but this helper is called
> by drivers in their .remove callback, which could lead to a use-after-free
> if a process has opened the emulated fbdev node while a driver is removed.
> 
> For example, in i915 driver the call chain during remove is the following:
> 
> struct pci_driver i915_pci_driver = {
> ...
>          .remove = i915_pci_remove,
> ...
> };
> 
> i915_pci_remove
>    i915_driver_remove
>      intel_modeset_driver_remove_noirq
>        intel_fbdev_fini
>          intel_fbdev_destroy
>            drm_fb_helper_fini
>              framebuffer_release
> 
> Later the process will close the fbdev node file descriptor leading to the
> mentioned use-after-free bug in drm_fbdev_fb_destroy(), due the following:
> 
> drm_fbdev_fb_destroy
>    drm_fbdev_release(info->par); <-- info was already freed on .remove
> 
> To prevent that, let's move the framebuffer_release() call to the end of
> the drm_fbdev_fb_destroy() function.
> 
> Also, the call to fb_dealloc_cmap() in drm_fb_helper_fini() is too early
> and is more correct to do it in drm_fbdev_fb_destroy() as well. After a
> call to drm_fbdev_release() has been made.
> 
> Reported-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> ---
>   drivers/gpu/drm/drm_fb_helper.c | 9 +++------
>   1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> index d265a73313c9..7288fbd26bcc 100644
> --- a/drivers/gpu/drm/drm_fb_helper.c
> +++ b/drivers/gpu/drm/drm_fb_helper.c
> @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
>   	cancel_work_sync(&fb_helper->resume_work);
>   	cancel_work_sync(&fb_helper->damage_work);
>   
> -	info = fb_helper->fbdev;
> -	if (info) {
> -		if (info->cmap.len)
> -			fb_dealloc_cmap(&info->cmap);
> -		framebuffer_release(info);
> -	}
>   	fb_helper->fbdev = NULL;
>   
>   	mutex_lock(&kernel_fb_helper_lock);
> @@ -2112,6 +2106,9 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper)
>   static void drm_fbdev_fb_destroy(struct fb_info *info)
>   {
>   	drm_fbdev_release(info->par);
> +	if (info->cmap.len)
> +		fb_dealloc_cmap(&info->cmap);
> +	framebuffer_release(info);
>   }
>   
>   static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
Javier Martinez Canillas May 10, 2022, 8:30 a.m. UTC | #14
Hello Thomas,

On 5/10/22 10:04, Thomas Zimmermann wrote:
> Hi
> 
> Am 10.05.22 um 00:42 schrieb Javier Martinez Canillas:
>> On 5/10/22 00:22, Andrzej Hajda wrote:
>>
>> [snip]
>>
>>>>    static void drm_fbdev_fb_destroy(struct fb_info *info)
>>>>    {
>>>> +       if (info->cmap.len)
>>>> +               fb_dealloc_cmap(&info->cmap);
>>>> +
>>>>           drm_fbdev_release(info->par);
>>>> +       framebuffer_release(info);
>>>
>>> I would put drm_fbdev_release at the beginning - it cancels workers
>>> which could expect cmap to be still valid.
>>>
>>
>> Indeed, you are correct again. [0] is the final version of the patch I've
>> but don't have an i915 test machine to give it a try. I'll test tomorrow
>> on my test systems to verify that it doesn't cause any regressions since
>> with other DRM drivers.
> 
> You have to go through all DRM drivers that call drm_fb_helper_fini() 
> and make sure that they free fb_info. For example armada appears to be 
> leaking now. [1]
>

But shouldn't fb_info be freed when unregister_framebuffer() is called
through drm_dev_unregister() ? AFAICT the call chain is the following:

drm_put_dev()
  drm_dev_unregister()
    drm_client_dev_unregister()
      drm_fbdev_client_unregister()
        drm_fb_helper_unregister_fbi()
          unregister_framebuffer()
            do_unregister_framebuffer()
              put_fb_info()
                drm_fbdev_fb_destroy()
                  framebuffer_release()

which is the reason why I believe that drm_fb_helper_fini() should be
an internal static function and only called from drm_fbdev_fb_destroy().

Drivers shouldn't really explicitly call this helper in my opinion.
Thomas Zimmermann May 10, 2022, 8:37 a.m. UTC | #15
Hi

Am 10.05.22 um 10:30 schrieb Javier Martinez Canillas:
> Hello Thomas,
> 
> On 5/10/22 10:04, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 10.05.22 um 00:42 schrieb Javier Martinez Canillas:
>>> On 5/10/22 00:22, Andrzej Hajda wrote:
>>>
>>> [snip]
>>>
>>>>>     static void drm_fbdev_fb_destroy(struct fb_info *info)
>>>>>     {
>>>>> +       if (info->cmap.len)
>>>>> +               fb_dealloc_cmap(&info->cmap);
>>>>> +
>>>>>            drm_fbdev_release(info->par);
>>>>> +       framebuffer_release(info);
>>>>
>>>> I would put drm_fbdev_release at the beginning - it cancels workers
>>>> which could expect cmap to be still valid.
>>>>
>>>
>>> Indeed, you are correct again. [0] is the final version of the patch I've
>>> but don't have an i915 test machine to give it a try. I'll test tomorrow
>>> on my test systems to verify that it doesn't cause any regressions since
>>> with other DRM drivers.
>>
>> You have to go through all DRM drivers that call drm_fb_helper_fini()
>> and make sure that they free fb_info. For example armada appears to be
>> leaking now. [1]
>>
> 
> But shouldn't fb_info be freed when unregister_framebuffer() is called
> through drm_dev_unregister() ? AFAICT the call chain is the following:
> 
> drm_put_dev()
>    drm_dev_unregister()
>      drm_client_dev_unregister()
>        drm_fbdev_client_unregister()
>          drm_fb_helper_unregister_fbi()
>            unregister_framebuffer()
>              do_unregister_framebuffer()
>                put_fb_info()
>                  drm_fbdev_fb_destroy()
>                    framebuffer_release()
> 
> which is the reason why I believe that drm_fb_helper_fini() should be
> an internal static function and only called from drm_fbdev_fb_destroy().
> 
> Drivers shouldn't really explicitly call this helper in my opinion.

Thanks.  That makes sense.

Best regards
Thomas


>
Thomas Zimmermann May 10, 2022, 8:50 a.m. UTC | #16
Hi

Am 10.05.22 um 10:37 schrieb Thomas Zimmermann:
...
>>>
>>> You have to go through all DRM drivers that call drm_fb_helper_fini()
>>> and make sure that they free fb_info. For example armada appears to be
>>> leaking now. [1]
>>>
>>
>> But shouldn't fb_info be freed when unregister_framebuffer() is called
>> through drm_dev_unregister() ? AFAICT the call chain is the following:
>>
>> drm_put_dev()
>>    drm_dev_unregister()
>>      drm_client_dev_unregister()
>>        drm_fbdev_client_unregister()
>>          drm_fb_helper_unregister_fbi()
>>            unregister_framebuffer()
>>              do_unregister_framebuffer()
>>                put_fb_info()
>>                  drm_fbdev_fb_destroy()
>>                    framebuffer_release()
>>
>> which is the reason why I believe that drm_fb_helper_fini() should be
>> an internal static function and only called from drm_fbdev_fb_destroy().
>>
>> Drivers shouldn't really explicitly call this helper in my opinion.

One more stupid question: does armada actually use 
drm_fbdev_fb_destroy()? It's supposed to be a callback for struct 
fb_ops. Armada uses it's own instance of fb_ops, which apparently 
doesn't contain fb_destroy. [1]

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v5.17.6/source/drivers/gpu/drm/armada/armada_fbdev.c#L19


> 
> Thanks.  That makes sense.
> 
> Best regards
> Thomas
> 
> 
>>
>
Javier Martinez Canillas May 10, 2022, 9:06 a.m. UTC | #17
Hello Thomas,

On 5/10/22 10:50, Thomas Zimmermann wrote:

[snip]

>>> Drivers shouldn't really explicitly call this helper in my opinion.
> 
> One more stupid question: does armada actually use 
> drm_fbdev_fb_destroy()? It's supposed to be a callback for struct 
> fb_ops. Armada uses it's own instance of fb_ops, which apparently 
> doesn't contain fb_destroy. [1]
>

No stupid question at all. You are correct on this. So I guess we still
need this call in the drivers that don't provide a .fb_destroy() handler.

I see many options here:

1) Document in drm_fb_helper_alloc_fbi() that drivers only need to call
   drm_fb_helper_fini() explicitly if they are not setting up a fbdev
   with drm_fbdev_generic_setup(), otherwise is not needed.

2) Make drm_fbdev_fb_destroy() an exported symbol so drivers that have
   custom fb_ops can use it.

3) Set .fb_destroy to drm_fbdev_fb_destroy() if isn't set by drivers when
   they call drm_fb_helper_initial_config() or drm_fb_helper_fill_info().

I'm leaning towards option (3). Then the fb_info release will be automatic
whether drivers are using the generic setup or a custom one.
Thomas Zimmermann May 10, 2022, 9:39 a.m. UTC | #18
Hi Javier

Am 10.05.22 um 11:06 schrieb Javier Martinez Canillas:
> Hello Thomas,
> 
> On 5/10/22 10:50, Thomas Zimmermann wrote:
> 
> [snip]
> 
>>>> Drivers shouldn't really explicitly call this helper in my opinion.
>>
>> One more stupid question: does armada actually use
>> drm_fbdev_fb_destroy()? It's supposed to be a callback for struct
>> fb_ops. Armada uses it's own instance of fb_ops, which apparently
>> doesn't contain fb_destroy. [1]
>>
> 
> No stupid question at all. You are correct on this. So I guess we still
> need this call in the drivers that don't provide a .fb_destroy() handler.
> 
> I see many options here:
> 
> 1) Document in drm_fb_helper_alloc_fbi() that drivers only need to call
>     drm_fb_helper_fini() explicitly if they are not setting up a fbdev
>     with drm_fbdev_generic_setup(), otherwise is not needed.
> 
> 2) Make drm_fbdev_fb_destroy() an exported symbol so drivers that have
>     custom fb_ops can use it.
> 
> 3) Set .fb_destroy to drm_fbdev_fb_destroy() if isn't set by drivers when
>     they call drm_fb_helper_initial_config() or drm_fb_helper_fill_info().
> 
> I'm leaning towards option (3). Then the fb_info release will be automatic
> whether drivers are using the generic setup or a custom one.

IMHO this would just be another glitch to paper over all the broken 
code. And if you follow through drm_fbdev_fb_helper(), [1] it'll call 
_fini at some point and probably blow up in some other way. Instances of 
struct fb_ops are also usually const.

The only reliable way AFAICT is to do what generic fbdev does: use 
unregister_framebuffer and do the software cleanup somewhere within 
fb_destroy. And then fix all drivers to use that pattern.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v5.17.6/source/drivers/gpu/drm/drm_fb_helper.c#L2082

>
Javier Martinez Canillas May 10, 2022, 9:44 a.m. UTC | #19
On 5/10/22 11:39, Thomas Zimmermann wrote:

[snip]

>>
>> 3) Set .fb_destroy to drm_fbdev_fb_destroy() if isn't set by drivers when
>>     they call drm_fb_helper_initial_config() or drm_fb_helper_fill_info().
>>
>> I'm leaning towards option (3). Then the fb_info release will be automatic
>> whether drivers are using the generic setup or a custom one.
> 
> IMHO this would just be another glitch to paper over all the broken 
> code. And if you follow through drm_fbdev_fb_helper(), [1] it'll call 
> _fini at some point and probably blow up in some other way. Instances of 
> struct fb_ops are also usually const.
> 
> The only reliable way AFAICT is to do what generic fbdev does: use 
> unregister_framebuffer and do the software cleanup somewhere within 
> fb_destroy. And then fix all drivers to use that pattern.
> 

Right. We can't really abstract this away from drivers that are not
using the generic fbdev helpers. So then they will have to provide
their own .fb_destroy() callback and do the cleanup.
Daniel Vetter May 11, 2022, 1:15 p.m. UTC | #20
On Mon, May 09, 2022 at 10:00:41PM +0200, Javier Martinez Canillas wrote:
> Hello Thomas,
> 
> On 5/9/22 20:32, Thomas Zimmermann wrote:
> > Hi
> > 
> > Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas:
> >> On 5/9/22 17:51, Andrzej Hajda wrote:
> >>
> >> [snip]
> >>
> >>>>>> +
> >>>>> Regarding drm:
> >>>>> What about drm_fb_helper_fini? It calls also framebuffer_release and is
> >>>>> called often from _remove paths (checked intel/radeon/nouveau). I guess
> >>>>> it should be fixed as well. Do you plan to fix it?
> >>>>>
> >>>> I think you are correct. Maybe we need something like the following?
> >>>>
> >>>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> >>>> index d265a73313c9..b09598f7af28 100644
> >>>> --- a/drivers/gpu/drm/drm_fb_helper.c
> >>>> +++ b/drivers/gpu/drm/drm_fb_helper.c
> >>>> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
> >>>>           if (info) {
> >>>>                   if (info->cmap.len)
> >>>>                           fb_dealloc_cmap(&info->cmap);
> >>>> -               framebuffer_release(info);
> > 
> > After reviewing that code,  drm_fb_helper_fini() appears to be called 
> > from .fb_destroy (see drm_fbdev_release).  The code is hard to follow 
> > though.  If there another way of releasing the framebuffer here?
> > 
> 
> Andrzej mentioned intel/radeon/nouveau as example, I only looked at i915
> and the call chain is the following as far as I can tell:
> 
> struct pci_driver i915_pci_driver = {
> ...
>         .remove = i915_pci_remove,
> ...
> };
> 
> 
> i915_driver_remove
>   intel_modeset_driver_remove_noirq
>     intel_fbdev_fini
>       intel_fbdev_destroy
>         drm_fb_helper_fini
>           framebuffer_release
>               
> So my underdestanding is that if a program has the emulated fbdev device
> opened and the i915 module is removed, then a use-after-free would be
> triggered on drm_fbdev_fb_destroy() once the program closes the device:
> 
> drm_fbdev_fb_destroy
>   drm_fbdev_release(info->par); <-- info was already freed on .remove

Yeah the old drivers that haven't switched over to the drm_client based
fbdev emulations are all kinds of wrong and release it too early.

Note that they don't use the provided ->remove hook, since that would
result in double-cleanup like you point out. Those old drivers work more
like all the other fbdev drivers where all the cleanup is done in
->remove, and if it's a real hotunplug you just die in fire :-/

Switching them all over to the drm_client based fbdev helpers and
unexporting these old (dangerous!) functions would be really neat. But
it's also a loooooot of work, and generally those big drivers don't get
hotunplugged.
-Daniel
Daniel Vetter May 11, 2022, 1:18 p.m. UTC | #21
On Tue, May 10, 2022 at 09:50:38AM +0200, Javier Martinez Canillas wrote:
> On 5/10/22 09:19, Andrzej Hajda wrote:
> > 
> > 
> > On 10.05.2022 00:42, Javier Martinez Canillas wrote:
> >> On 5/10/22 00:22, Andrzej Hajda wrote:
> >>
> >> [snip]
> >>
> >>>>    static void drm_fbdev_fb_destroy(struct fb_info *info)
> >>>>    {
> >>>> +       if (info->cmap.len)
> >>>> +               fb_dealloc_cmap(&info->cmap);
> >>>> +
> >>>>           drm_fbdev_release(info->par);
> >>>> +       framebuffer_release(info);
> >>> I would put drm_fbdev_release at the beginning - it cancels workers
> >>> which could expect cmap to be still valid.
> >>>
> >> Indeed, you are correct again. [0] is the final version of the patch I've
> >> but don't have an i915 test machine to give it a try. I'll test tomorrow
> >> on my test systems to verify that it doesn't cause any regressions since
> >> with other DRM drivers.
> >>
> >> I think that besides this patch, drivers shouldn't need to call to the
> >> drm_fb_helper_fini() function directly. Since that would be called during
> >> drm_fbdev_fb_destroy() anyways.
> >>
> >> We should probably remove that call in all drivers and make this helper
> >> function static and just private to drm_fb_helper functions.
> >>
> >> Or am I missing something here ?
> > 
> > This is question for experts :)
> 
> Fair. I'm definitely not one of them :)
> 
> > I do not know what are user API/ABI expectations regarding removal of 
> > fbdev driver, I wonder if they are documented somewhere :)
> 
> I don't know. At least I haven't found them.
> 
> > Apparently we have some process of 'zombification'  here - we need to 
> > remove the driver without waiting for userspace closing framebuffer(???) 
> > (to unbind ops-es and remove references to driver related things), but 
> > we need to leave some structures to fool userspace, 'info' seems to be 
> > one of them.
> 
> That's correct, yes. I think that any driver that provides a .mmap file
> operation would have the same issue. But drivers keep an internal state
> and just return -ENODEV or whatever on read/write/close after a removal.

Just commenting on the mmap part here. I think there's two options:

- shadow buffer for fbdev defio, and keep the shadow buffer around until
  fb_destroy

- redirect fbdev mmap fully to gem mmap, and make sure the gem mmap is
  hotunplug safe. The approach amd folks are pushing for that we discussed
  is to replace them all with a dummy r/w page, because removing the ptes
  means you can get a SIGBUS almost anywhere in application code, and that
  violates like all the assumptions behind gl/vk and would just crash your
  desktop. Reading/writing garbage otoh is generally much better.

So yeah hotunplug safe fbdev mmap is still quite a bit of work ...

Cheers, Daniel
> 
> The fbdev subsystem is different though since as you said it, the fbdev
> core unconditionally calls to the driver .fb_release() callback with a
> struct fb_info reference as argument.
> 
> I tried to prevent that with commit aafa025c76dc ("fbdev: Make fb_release()
> return -ENODEV if fbdev was unregistered") but Daniel pointed out that
> is was wrong since could leak memory allocated and was expected to be
> freed on release.
> 
> That's why I instead fixed the issue in the fbdev drivers and just added
> a warn on fb_release(), that is $SUBJECT.
> 
> > So I guess there should be something called on driver's _remove path, 
> > and sth on destroy path.
> >
> 
> That was my question actually, do we need something to be called in the
> destroy path ? Since that could just be internal to the DRM fb helpers.
> 
> In other words, drivers should only care about setting a generic fbdev
> by calling drm_fbdev_generic_setup(), and then do any HW cleanup in the
> removal path, but let the fb helpers to handle the SW cleanup in destroy.
>  
> -- 
> Best regards,
> 
> Javier Martinez Canillas
> Linux Engineering
> Red Hat
>
diff mbox series

Patch

diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c
index 8c1ee9ecec3d..c2a60b187467 100644
--- a/drivers/video/fbdev/core/fbsysfs.c
+++ b/drivers/video/fbdev/core/fbsysfs.c
@@ -80,6 +80,10 @@  void framebuffer_release(struct fb_info *info)
 {
 	if (!info)
 		return;
+
+	if (WARN_ON(refcount_read(&info->count)))
+		return;
+
 	kfree(info->apertures);
 	kfree(info);
 }