diff mbox

drm: fix i_mapping and f_mapping initialization in drm_open in error path

Message ID Pine.GSO.4.64.1304011249370.27674@umail (mailing list archive)
State New, archived
Headers show

Commit Message

Ilija Hadzic April 1, 2013, 6:14 p.m. UTC
On Sun, 31 Mar 2013, Michal Hocko wrote:

> On Sat 30-03-13 18:26:53, Ilija Hadzic wrote:
>> This looks a bit like a hack and it doesn't look right,
>> conceptually. If the call fails, it should restore things as if
>> nothing has ever happened and overwriting old_mapping is not going to
>> do the trick.
>
> OK, I thought this is what the patch does as it falls back to
> &inode->i_data which is the default mapping for all inodes or it uses
> what used to be in device mapping.
>
> I am obviously not familiar with the drm code but it feels a bit strange
> that the device mapping can be different than inode's resp. file's one

The reason for this is explained in commit message associated with
949c4a34.

In summary, the device's mapping is that of the inode associated with the
first opener. Before 949c4a34, subsequent openers would have to come in
through exactly the same inode that the first opener came in (otherwise 
the open call would fail). So if a user did something like: start X, 
remove /dev/dri/cardN file, mknod the same file again, the applications 
started after such an action would stop working. Also, using the GPU from 
chroot-ed environment was not possible if there was another opener from 
different root.

The 949c4a34, removed this restriction, but introduced a problem with 
VmWare GPU drivers, which fdb40a08. However, fdb40a08 introduced the bug 
that you have reported.

The problem that I have with your proposed fix is that if the first opener 
fails, it can set the device's mapping to that of the inode that was never 
used and never opened (and could even be removed later down the road).

> and even more confusing that inode and file are saved separately.
>

I was trying to quickly get out the patch that was safe in terms of 
introducing new breakage. So the "conservative" thing to do (without 
having to think through all possible scenarios) was to restore each of the 
three pointers from their own temporary variable. Thinking about it, you 
are probably right that file descriptor's and inode's mapping pointer are 
equal when open call is entered so we could use one variable. However, you 
still need a separate variable to store the device's mapping pointer 
because that one can be different.

Attached is a v2 of the patch, for reference. I would appreciate if the 
original reporter or you tested it in lieu of your proposed patch and let 
me know if it fixes your issue.

-- Ilija
From 7e3c832158e2552e5e106a588e2b9e61c35b68f2 Mon Sep 17 00:00:00 2001
From: Ilija Hadzic <ihadzic@research.bell-labs.com>

Date: Sat, 30 Mar 2013 18:20:35 -0400
Subject: [PATCH] drm: correctly restore mappings if drm_open fails

If first drm_open fails, the error-handling path will
incorrectly restore inode's mapping to NULL. This can
cause the crash later on. Fix by separately storing
away mapping pointers that drm_open can touch and
restore each from its own respective variable if the
call fails.

Reference:
http://lists.freedesktop.org/archives/dri-devel/2013-March/036564.html

v2: use one variable to store file and inode mapping
    since they are the same at the function entry; also
    fix spelling mistakes in commit message.

Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>

Cc: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org

Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>

---
 drivers/gpu/drm/drm_fops.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

-- 
1.7.4.1

Comments

Michal Hocko April 2, 2013, 8:25 a.m. UTC | #1
On Mon 01-04-13 13:14:50, Ilija Hadzic wrote:
> 
> 
> On Sun, 31 Mar 2013, Michal Hocko wrote:
> 
> >On Sat 30-03-13 18:26:53, Ilija Hadzic wrote:
> >>This looks a bit like a hack and it doesn't look right,
> >>conceptually. If the call fails, it should restore things as if
> >>nothing has ever happened and overwriting old_mapping is not going to
> >>do the trick.
> >
> >OK, I thought this is what the patch does as it falls back to
> >&inode->i_data which is the default mapping for all inodes or it uses
> >what used to be in device mapping.
> >
> >I am obviously not familiar with the drm code but it feels a bit strange
> >that the device mapping can be different than inode's resp. file's one
> 
> The reason for this is explained in commit message associated with
> 949c4a34.
> 
> In summary, the device's mapping is that of the inode associated with the
> first opener. Before 949c4a34, subsequent openers would have to come in
> through exactly the same inode that the first opener came in
> (otherwise the open call would fail). So if a user did something
> like: start X, remove /dev/dri/cardN file, mknod the same file
> again, the applications started after such an action would stop
> working. Also, using the GPU from chroot-ed environment was not
> possible if there was another opener from different root.

Oh, I see. Thanks for the clarification.

> The 949c4a34, removed this restriction, but introduced a problem
> with VmWare GPU drivers, which fdb40a08. However, fdb40a08
> introduced the bug that you have reported.
> 
> The problem that I have with your proposed fix is that if the first
> opener fails, it can set the device's mapping to that of the inode
> that was never used and never opened (and could even be removed
> later down the road).

Makes sense.

> >and even more confusing that inode and file are saved separately.
> >
> 
> I was trying to quickly get out the patch that was safe in terms of
> introducing new breakage. So the "conservative" thing to do (without
> having to think through all possible scenarios) was to restore each
> of the three pointers from their own temporary variable. Thinking
> about it, you are probably right that file descriptor's and inode's
> mapping pointer are equal when open call is entered so we could use
> one variable. However, you still need a separate variable to store
> the device's mapping pointer because that one can be different.

Right.

> Attached is a v2 of the patch, for reference. I would appreciate if
> the original reporter or you tested it in lieu of your proposed
> patch and let me know if it fixes your issue.

OK, this is a call for Marco. I have attached this bug to our bugzilla
as well (just for reference:
https://bugzilla.novell.com/show_bug.cgi?id=807850)

> 
> -- Ilija

> From 7e3c832158e2552e5e106a588e2b9e61c35b68f2 Mon Sep 17 00:00:00 2001
> From: Ilija Hadzic <ihadzic@research.bell-labs.com>
> Date: Sat, 30 Mar 2013 18:20:35 -0400
> Subject: [PATCH] drm: correctly restore mappings if drm_open fails
> 
> If first drm_open fails, the error-handling path will
> incorrectly restore inode's mapping to NULL. This can
> cause the crash later on. Fix by separately storing
> away mapping pointers that drm_open can touch and
> restore each from its own respective variable if the
> call fails.
> 
> Reference:
> http://lists.freedesktop.org/archives/dri-devel/2013-March/036564.html
> 
> v2: use one variable to store file and inode mapping
>     since they are the same at the function entry; also
>     fix spelling mistakes in commit message.
> 
> Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>
> Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: stable@vger.kernel.org

Feel free to add
Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thanks!

> Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
> ---
>  drivers/gpu/drm/drm_fops.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> index 13fdcd1..429e07d 100644
> --- a/drivers/gpu/drm/drm_fops.c
> +++ b/drivers/gpu/drm/drm_fops.c
> @@ -123,6 +123,7 @@ int drm_open(struct inode *inode, struct file *filp)
>  	int retcode = 0;
>  	int need_setup = 0;
>  	struct address_space *old_mapping;
> +	struct address_space *old_imapping;
>  
>  	minor = idr_find(&drm_minors_idr, minor_id);
>  	if (!minor)
> @@ -137,6 +138,7 @@ int drm_open(struct inode *inode, struct file *filp)
>  	if (!dev->open_count++)
>  		need_setup = 1;
>  	mutex_lock(&dev->struct_mutex);
> +	old_imapping = inode->i_mapping;
>  	old_mapping = dev->dev_mapping;
>  	if (old_mapping == NULL)
>  		dev->dev_mapping = &inode->i_data;
> @@ -159,8 +161,8 @@ int drm_open(struct inode *inode, struct file *filp)
>  
>  err_undo:
>  	mutex_lock(&dev->struct_mutex);
> -	filp->f_mapping = old_mapping;
> -	inode->i_mapping = old_mapping;
> +	filp->f_mapping = old_imapping;
> +	inode->i_mapping = old_imapping;
>  	iput(container_of(dev->dev_mapping, struct inode, i_data));
>  	dev->dev_mapping = old_mapping;
>  	mutex_unlock(&dev->struct_mutex);
> -- 
> 1.7.4.1
>
Marco Munderloh April 2, 2013, 10:36 a.m. UTC | #2
> Attached is a v2 of the patch, for reference. I would appreciate if the original reporter or you tested it in lieu of your proposed patch and let me know if it fixes your
> issue.

The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as rmmod radeon do not end up in a crash anymore. However, I have still no clue why one of these makes 
drm_open to fail. On rmmod radeon I get the following log messages. If don't know if the 'unpin not necessary' has anything to do with it.

[drm] radeon: finishing device.
radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary
radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary
radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary
[TTM] Finalizing pool allocator
[TTM] Finalizing DMA pool allocator
[TTM] Zone  kernel: Used memory at exit: 0 kiB
[TTM] Zone   dma32: Used memory at exit: 0 kiB
[drm] radeon: ttm finalized
vga_switcheroo: disabled
[drm] Module unloaded

By the way, sometimes my r8169 ethernet controller does not survive suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't know if this is related.
Ilija Hadzic April 2, 2013, 11:23 a.m. UTC | #3
Thanks for testing. Other issues are probably unrelated, so I'll send the
last version of the patch to Dave.

-- Ilija

On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh <munderl@tnt.uni-hannover.de
> wrote:

> Attached is a v2 of the patch, for reference. I would appreciate if the
>> original reporter or you tested it in lieu of your proposed patch and let
>> me know if it fixes your
>> issue.
>>
>
> The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as rmmod
> radeon do not end up in a crash anymore. However, I have still no clue why
> one of these makes drm_open to fail. On rmmod radeon I get the following
> log messages. If don't know if the 'unpin not necessary' has anything to do
> with it.
>
> [drm] radeon: finishing device.
> radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary
> radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary
> radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary
> [TTM] Finalizing pool allocator
> [TTM] Finalizing DMA pool allocator
> [TTM] Zone  kernel: Used memory at exit: 0 kiB
> [TTM] Zone   dma32: Used memory at exit: 0 kiB
> [drm] radeon: ttm finalized
> vga_switcheroo: disabled
> [drm] Module unloaded
>
> By the way, sometimes my r8169 ethernet controller does not survive
> suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't
> know if this is related.
>
>
Marco Munderloh April 2, 2013, 12:01 p.m. UTC | #4
Hi Ilija,

> Thanks for testing. Other issues are probably unrelated, so I'll send the last version of the patch to Dave.

I came across another problem which seems related. rmmod radeon works, however, modprobe radeon afterwards results in a crash (divide error), see attachment.

Best, Marco

On 02.04.2013 13:23, Ilija Hadzic wrote:
>
> -- Ilija
>
> On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh <munderl@tnt.uni-hannover.de <mailto:munderl@tnt.uni-hannover.de>> wrote:
>
>         Attached is a v2 of the patch, for reference. I would appreciate if the original reporter or you tested it in lieu of your proposed patch and let me know if it
>         fixes your
>         issue.
>
>
>     The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as rmmod radeon do not end up in a crash anymore. However, I have still no clue why one of these makes
>     drm_open to fail. On rmmod radeon I get the following log messages. If don't know if the 'unpin not necessary' has anything to do with it.
>
>     [drm] radeon: finishing device.
>     radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary
>     radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary
>     radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary
>     [TTM] Finalizing pool allocator
>     [TTM] Finalizing DMA pool allocator
>     [TTM] Zone  kernel: Used memory at exit: 0 kiB
>     [TTM] Zone   dma32: Used memory at exit: 0 kiB
>     [drm] radeon: ttm finalized
>     vga_switcheroo: disabled
>     [drm] Module unloaded
>
>     By the way, sometimes my r8169 ethernet controller does not survive suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't know if this is related.
>
>
Alex Deucher April 2, 2013, 1:48 p.m. UTC | #5
On Tue, Apr 2, 2013 at 9:31 AM, Ilija Hadzic
<ihadzic@research.bell-labs.com> wrote:
>
> Marco,
>
> What makes you think that the crash after second modprobe is related to the
> mappings pointers in DRM module? Can you actually establish the correlation
> between these patches and the crash or you are just suspecting because your
> other bug had something to do with module removal/insertion?
>
> If it's the latter, then you may want to open another bug report here
> https://bugs.freedesktop.org/ (use DRI for product and pick DRM/radeon for
> component) and have this issue tracked and addressed separately.
>
> The divide error that your log shows apparently happens at this line
> inside r6xx_remap_render_backend:
>
> pipe_rb_ratio = rendering_pipe_num / req_rb_num;
>
> I would suspect that req_rb_num somehow evaluates to zero at the second
> modprobe. That variable seems to be the derived of the last three arguments
> to r6xx_remap_render_backend. If I look at the caller (evergreen_gpu_init)
> the arguments that have the play here are all derived from the GPU's
> hardware registers (or are the constant for a given GPU device). So I
> suspect that the GPU driver leaves some state in GPU at module removal that
> later bites you.

Newer kernels have a fix for this.
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f689e3acbd2e48cc4101e0af454193f81af4baaf

Alex

>
> -- Ilija
>
> On Tue, 2 Apr 2013, Marco Munderloh wrote:
>
>> Hi Ilija,
>>
>>> Thanks for testing. Other issues are probably unrelated, so I'll send the
>>> last version of the patch to Dave.
>>
>>
>> I came across another problem which seems related. rmmod radeon works,
>> however, modprobe radeon afterwards results in a crash (divide error), see
>> attachment.
>>
>> Best, Marco
>>
>> On 02.04.2013 13:23, Ilija Hadzic wrote:
>>>
>>>
>>> -- Ilija
>>>
>>> On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh
>>> <munderl@tnt.uni-hannover.de <mailto:munderl@tnt.uni-hannover.de>> wrote:
>>>
>>>         Attached is a v2 of the patch, for reference. I would appreciate
>>> if the original reporter or you tested it in lieu of your proposed patch and
>>> let me know if it
>>>         fixes your
>>>         issue.
>>>
>>>
>>>     The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as
>>> rmmod radeon do not end up in a crash anymore. However, I have still no clue
>>> why one of these makes
>>>     drm_open to fail. On rmmod radeon I get the following log messages.
>>> If don't know if the 'unpin not necessary' has anything to do with it.
>>>
>>>     [drm] radeon: finishing device.
>>>     radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary
>>>     radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary
>>>     radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary
>>>     [TTM] Finalizing pool allocator
>>>     [TTM] Finalizing DMA pool allocator
>>>     [TTM] Zone  kernel: Used memory at exit: 0 kiB
>>>     [TTM] Zone   dma32: Used memory at exit: 0 kiB
>>>     [drm] radeon: ttm finalized
>>>     vga_switcheroo: disabled
>>>     [drm] Module unloaded
>>>
>>>     By the way, sometimes my r8169 ethernet controller does not survive
>>> suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't
>>> know if this is related.
>>>
>>>
>>
>> --
>> Dipl.-Ing. Marco Munderloh             Mail: munderl@tnt.uni-hannover.de
>> Institut für Informationsverarbeitung (TNT)     Phone: +49 511 762-19587
>> Leibniz Universitaet Hannover, Appelstr. 9a       Fax: +49 511 762- 5333
>> 30167 Hannover, Germany     Web: http://www.tnt.uni-hannover.de/~munderl
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index 13fdcd1..429e07d 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -123,6 +123,7 @@  int drm_open(struct inode *inode, struct file *filp)
 	int retcode = 0;
 	int need_setup = 0;
 	struct address_space *old_mapping;
+	struct address_space *old_imapping;
 
 	minor = idr_find(&drm_minors_idr, minor_id);
 	if (!minor)
@@ -137,6 +138,7 @@  int drm_open(struct inode *inode, struct file *filp)
 	if (!dev->open_count++)
 		need_setup = 1;
 	mutex_lock(&dev->struct_mutex);
+	old_imapping = inode->i_mapping;
 	old_mapping = dev->dev_mapping;
 	if (old_mapping == NULL)
 		dev->dev_mapping = &inode->i_data;
@@ -159,8 +161,8 @@  int drm_open(struct inode *inode, struct file *filp)
 
 err_undo:
 	mutex_lock(&dev->struct_mutex);
-	filp->f_mapping = old_mapping;
-	inode->i_mapping = old_mapping;
+	filp->f_mapping = old_imapping;
+	inode->i_mapping = old_imapping;
 	iput(container_of(dev->dev_mapping, struct inode, i_data));
 	dev->dev_mapping = old_mapping;
 	mutex_unlock(&dev->struct_mutex);