Message ID | Pine.GSO.4.64.1304011249370.27674@umail (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon 01-04-13 13:14:50, Ilija Hadzic wrote: > > > On Sun, 31 Mar 2013, Michal Hocko wrote: > > >On Sat 30-03-13 18:26:53, Ilija Hadzic wrote: > >>This looks a bit like a hack and it doesn't look right, > >>conceptually. If the call fails, it should restore things as if > >>nothing has ever happened and overwriting old_mapping is not going to > >>do the trick. > > > >OK, I thought this is what the patch does as it falls back to > >&inode->i_data which is the default mapping for all inodes or it uses > >what used to be in device mapping. > > > >I am obviously not familiar with the drm code but it feels a bit strange > >that the device mapping can be different than inode's resp. file's one > > The reason for this is explained in commit message associated with > 949c4a34. > > In summary, the device's mapping is that of the inode associated with the > first opener. Before 949c4a34, subsequent openers would have to come in > through exactly the same inode that the first opener came in > (otherwise the open call would fail). So if a user did something > like: start X, remove /dev/dri/cardN file, mknod the same file > again, the applications started after such an action would stop > working. Also, using the GPU from chroot-ed environment was not > possible if there was another opener from different root. Oh, I see. Thanks for the clarification. > The 949c4a34, removed this restriction, but introduced a problem > with VmWare GPU drivers, which fdb40a08. However, fdb40a08 > introduced the bug that you have reported. > > The problem that I have with your proposed fix is that if the first > opener fails, it can set the device's mapping to that of the inode > that was never used and never opened (and could even be removed > later down the road). Makes sense. > >and even more confusing that inode and file are saved separately. > > > > I was trying to quickly get out the patch that was safe in terms of > introducing new breakage. So the "conservative" thing to do (without > having to think through all possible scenarios) was to restore each > of the three pointers from their own temporary variable. Thinking > about it, you are probably right that file descriptor's and inode's > mapping pointer are equal when open call is entered so we could use > one variable. However, you still need a separate variable to store > the device's mapping pointer because that one can be different. Right. > Attached is a v2 of the patch, for reference. I would appreciate if > the original reporter or you tested it in lieu of your proposed > patch and let me know if it fixes your issue. OK, this is a call for Marco. I have attached this bug to our bugzilla as well (just for reference: https://bugzilla.novell.com/show_bug.cgi?id=807850) > > -- Ilija > From 7e3c832158e2552e5e106a588e2b9e61c35b68f2 Mon Sep 17 00:00:00 2001 > From: Ilija Hadzic <ihadzic@research.bell-labs.com> > Date: Sat, 30 Mar 2013 18:20:35 -0400 > Subject: [PATCH] drm: correctly restore mappings if drm_open fails > > If first drm_open fails, the error-handling path will > incorrectly restore inode's mapping to NULL. This can > cause the crash later on. Fix by separately storing > away mapping pointers that drm_open can touch and > restore each from its own respective variable if the > call fails. > > Reference: > http://lists.freedesktop.org/archives/dri-devel/2013-March/036564.html > > v2: use one variable to store file and inode mapping > since they are the same at the function entry; also > fix spelling mistakes in commit message. > > Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de> > Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com> > Cc: Michal Hocko <mhocko@suse.cz> > Cc: stable@vger.kernel.org Feel free to add Reviewed-by: Michal Hocko <mhocko@suse.cz> Thanks! > Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com> > --- > drivers/gpu/drm/drm_fops.c | 6 ++++-- > 1 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > index 13fdcd1..429e07d 100644 > --- a/drivers/gpu/drm/drm_fops.c > +++ b/drivers/gpu/drm/drm_fops.c > @@ -123,6 +123,7 @@ int drm_open(struct inode *inode, struct file *filp) > int retcode = 0; > int need_setup = 0; > struct address_space *old_mapping; > + struct address_space *old_imapping; > > minor = idr_find(&drm_minors_idr, minor_id); > if (!minor) > @@ -137,6 +138,7 @@ int drm_open(struct inode *inode, struct file *filp) > if (!dev->open_count++) > need_setup = 1; > mutex_lock(&dev->struct_mutex); > + old_imapping = inode->i_mapping; > old_mapping = dev->dev_mapping; > if (old_mapping == NULL) > dev->dev_mapping = &inode->i_data; > @@ -159,8 +161,8 @@ int drm_open(struct inode *inode, struct file *filp) > > err_undo: > mutex_lock(&dev->struct_mutex); > - filp->f_mapping = old_mapping; > - inode->i_mapping = old_mapping; > + filp->f_mapping = old_imapping; > + inode->i_mapping = old_imapping; > iput(container_of(dev->dev_mapping, struct inode, i_data)); > dev->dev_mapping = old_mapping; > mutex_unlock(&dev->struct_mutex); > -- > 1.7.4.1 >
> Attached is a v2 of the patch, for reference. I would appreciate if the original reporter or you tested it in lieu of your proposed patch and let me know if it fixes your > issue. The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as rmmod radeon do not end up in a crash anymore. However, I have still no clue why one of these makes drm_open to fail. On rmmod radeon I get the following log messages. If don't know if the 'unpin not necessary' has anything to do with it. [drm] radeon: finishing device. radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary [TTM] Finalizing pool allocator [TTM] Finalizing DMA pool allocator [TTM] Zone kernel: Used memory at exit: 0 kiB [TTM] Zone dma32: Used memory at exit: 0 kiB [drm] radeon: ttm finalized vga_switcheroo: disabled [drm] Module unloaded By the way, sometimes my r8169 ethernet controller does not survive suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't know if this is related.
Thanks for testing. Other issues are probably unrelated, so I'll send the last version of the patch to Dave. -- Ilija On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh <munderl@tnt.uni-hannover.de > wrote: > Attached is a v2 of the patch, for reference. I would appreciate if the >> original reporter or you tested it in lieu of your proposed patch and let >> me know if it fixes your >> issue. >> > > The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as rmmod > radeon do not end up in a crash anymore. However, I have still no clue why > one of these makes drm_open to fail. On rmmod radeon I get the following > log messages. If don't know if the 'unpin not necessary' has anything to do > with it. > > [drm] radeon: finishing device. > radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary > radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary > radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary > [TTM] Finalizing pool allocator > [TTM] Finalizing DMA pool allocator > [TTM] Zone kernel: Used memory at exit: 0 kiB > [TTM] Zone dma32: Used memory at exit: 0 kiB > [drm] radeon: ttm finalized > vga_switcheroo: disabled > [drm] Module unloaded > > By the way, sometimes my r8169 ethernet controller does not survive > suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't > know if this is related. > >
Hi Ilija, > Thanks for testing. Other issues are probably unrelated, so I'll send the last version of the patch to Dave. I came across another problem which seems related. rmmod radeon works, however, modprobe radeon afterwards results in a crash (divide error), see attachment. Best, Marco On 02.04.2013 13:23, Ilija Hadzic wrote: > > -- Ilija > > On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh <munderl@tnt.uni-hannover.de <mailto:munderl@tnt.uni-hannover.de>> wrote: > > Attached is a v2 of the patch, for reference. I would appreciate if the original reporter or you tested it in lieu of your proposed patch and let me know if it > fixes your > issue. > > > The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as rmmod radeon do not end up in a crash anymore. However, I have still no clue why one of these makes > drm_open to fail. On rmmod radeon I get the following log messages. If don't know if the 'unpin not necessary' has anything to do with it. > > [drm] radeon: finishing device. > radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary > radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary > radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary > [TTM] Finalizing pool allocator > [TTM] Finalizing DMA pool allocator > [TTM] Zone kernel: Used memory at exit: 0 kiB > [TTM] Zone dma32: Used memory at exit: 0 kiB > [drm] radeon: ttm finalized > vga_switcheroo: disabled > [drm] Module unloaded > > By the way, sometimes my r8169 ethernet controller does not survive suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't know if this is related. > >
On Tue, Apr 2, 2013 at 9:31 AM, Ilija Hadzic <ihadzic@research.bell-labs.com> wrote: > > Marco, > > What makes you think that the crash after second modprobe is related to the > mappings pointers in DRM module? Can you actually establish the correlation > between these patches and the crash or you are just suspecting because your > other bug had something to do with module removal/insertion? > > If it's the latter, then you may want to open another bug report here > https://bugs.freedesktop.org/ (use DRI for product and pick DRM/radeon for > component) and have this issue tracked and addressed separately. > > The divide error that your log shows apparently happens at this line > inside r6xx_remap_render_backend: > > pipe_rb_ratio = rendering_pipe_num / req_rb_num; > > I would suspect that req_rb_num somehow evaluates to zero at the second > modprobe. That variable seems to be the derived of the last three arguments > to r6xx_remap_render_backend. If I look at the caller (evergreen_gpu_init) > the arguments that have the play here are all derived from the GPU's > hardware registers (or are the constant for a given GPU device). So I > suspect that the GPU driver leaves some state in GPU at module removal that > later bites you. Newer kernels have a fix for this. http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f689e3acbd2e48cc4101e0af454193f81af4baaf Alex > > -- Ilija > > On Tue, 2 Apr 2013, Marco Munderloh wrote: > >> Hi Ilija, >> >>> Thanks for testing. Other issues are probably unrelated, so I'll send the >>> last version of the patch to Dave. >> >> >> I came across another problem which seems related. rmmod radeon works, >> however, modprobe radeon afterwards results in a crash (divide error), see >> attachment. >> >> Best, Marco >> >> On 02.04.2013 13:23, Ilija Hadzic wrote: >>> >>> >>> -- Ilija >>> >>> On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh >>> <munderl@tnt.uni-hannover.de <mailto:munderl@tnt.uni-hannover.de>> wrote: >>> >>> Attached is a v2 of the patch, for reference. I would appreciate >>> if the original reporter or you tested it in lieu of your proposed patch and >>> let me know if it >>> fixes your >>> issue. >>> >>> >>> The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as >>> rmmod radeon do not end up in a crash anymore. However, I have still no clue >>> why one of these makes >>> drm_open to fail. On rmmod radeon I get the following log messages. >>> If don't know if the 'unpin not necessary' has anything to do with it. >>> >>> [drm] radeon: finishing device. >>> radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary >>> radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary >>> radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary >>> [TTM] Finalizing pool allocator >>> [TTM] Finalizing DMA pool allocator >>> [TTM] Zone kernel: Used memory at exit: 0 kiB >>> [TTM] Zone dma32: Used memory at exit: 0 kiB >>> [drm] radeon: ttm finalized >>> vga_switcheroo: disabled >>> [drm] Module unloaded >>> >>> By the way, sometimes my r8169 ethernet controller does not survive >>> suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't >>> know if this is related. >>> >>> >> >> -- >> Dipl.-Ing. Marco Munderloh Mail: munderl@tnt.uni-hannover.de >> Institut für Informationsverarbeitung (TNT) Phone: +49 511 762-19587 >> Leibniz Universitaet Hannover, Appelstr. 9a Fax: +49 511 762- 5333 >> 30167 Hannover, Germany Web: http://www.tnt.uni-hannover.de/~munderl > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel >
diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 13fdcd1..429e07d 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -123,6 +123,7 @@ int drm_open(struct inode *inode, struct file *filp) int retcode = 0; int need_setup = 0; struct address_space *old_mapping; + struct address_space *old_imapping; minor = idr_find(&drm_minors_idr, minor_id); if (!minor) @@ -137,6 +138,7 @@ int drm_open(struct inode *inode, struct file *filp) if (!dev->open_count++) need_setup = 1; mutex_lock(&dev->struct_mutex); + old_imapping = inode->i_mapping; old_mapping = dev->dev_mapping; if (old_mapping == NULL) dev->dev_mapping = &inode->i_data; @@ -159,8 +161,8 @@ int drm_open(struct inode *inode, struct file *filp) err_undo: mutex_lock(&dev->struct_mutex); - filp->f_mapping = old_mapping; - inode->i_mapping = old_mapping; + filp->f_mapping = old_imapping; + inode->i_mapping = old_imapping; iput(container_of(dev->dev_mapping, struct inode, i_data)); dev->dev_mapping = old_mapping; mutex_unlock(&dev->struct_mutex);