diff mbox

[v3,0/8] change sb_writers to use percpu_rw_semaphore

Message ID 20150814171935.GA15042@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Oleg Nesterov Aug. 14, 2015, 5:19 p.m. UTC
On 08/13, Jan Kara wrote:
>
> Regarding the routing, ideally Al Viro should take these as a VFS
> maintainer.

Al, could you take these patches?

Only cosmetic changes in V3 to address the comments from Jan, I
preserved his acks.

In case you missed all the spam I sent before, let me repeat that
the awful (and currently unneeded) 7/8 will be reverted later. We
need it to ensure that other percpu_rw_semaphore changes routed
via another tree won't break fs/super.c. After that we will add
rcu_sync_dtor(s_writers->rw_sem) into deactivate_locked_super()
and revert this horror.

3/8 documents the lockdep problems we currently have. This is fixed
by the patch below but it depends on xfs ILOCK fixes from Dave, so
I will send it later. Plus another patch which removes the "trylock"
hack in __sb_start_write().

Oleg.

 arch/Kconfig                  |    1 -
 fs/btrfs/transaction.c        |    8 +--
 fs/super.c                    |  184 +++++++++++++++++++---------------------
 fs/xfs/xfs_aops.c             |    6 +-
 include/linux/fs.h            |   23 +++---
 include/linux/percpu-rwsem.h  |   20 +++++
 init/Kconfig                  |    1 -
 kernel/locking/Makefile       |    3 +-
 kernel/locking/percpu-rwsem.c |   13 +++
 lib/Kconfig                   |    3 -
 10 files changed, 136 insertions(+), 126 deletions(-)

--------------------------------------------------------------------------
[PATCH v3 9/8] don't fool lockdep in freeze_super() and thaw_super() paths

sb_wait_write()->percpu_rwsem_release() fools lockdep to avoid the
false-positives. Now that xfs was fixed by Dave we can remove it and
change freeze_super() and thaw_super() to run with s_writers.rw_sem
locks held; we add two trivial helpers for that, sb_freeze_release()
and sb_freeze_acquire().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
---
 fs/super.c |   37 +++++++++++++++++++++++++------------
 1 files changed, 25 insertions(+), 12 deletions(-)

Comments

Al Viro Aug. 15, 2015, 7:17 a.m. UTC | #1
On Fri, Aug 14, 2015 at 07:19:35PM +0200, Oleg Nesterov wrote:
> On 08/13, Jan Kara wrote:
> >
> > Regarding the routing, ideally Al Viro should take these as a VFS
> > maintainer.
> 
> Al, could you take these patches?

I can live with that.  Do you have it as a branch in a public git tree?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arthur Marsh Aug. 16, 2015, 1:47 p.m. UTC | #2
Oleg Nesterov wrote on 15/08/15 02:49:
> On 08/13, Jan Kara wrote:
>>
>> Regarding the routing, ideally Al Viro should take these as a VFS
>> maintainer.
>
> Al, could you take these patches?
>
> Only cosmetic changes in V3 to address the comments from Jan, I
> preserved his acks.
>
> In case you missed all the spam I sent before, let me repeat that
> the awful (and currently unneeded) 7/8 will be reverted later. We
> need it to ensure that other percpu_rw_semaphore changes routed
> via another tree won't break fs/super.c. After that we will add
> rcu_sync_dtor(s_writers->rw_sem) into deactivate_locked_super()
> and revert this horror.
>
> 3/8 documents the lockdep problems we currently have. This is fixed
> by the patch below but it depends on xfs ILOCK fixes from Dave, so
> I will send it later. Plus another patch which removes the "trylock"
> hack in __sb_start_write().
>
> Oleg.

Would these patches address what I've seen in the last day or so using 
Linus' git head kernel and seeing problems like:

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.2.0-rc6+ (root@victoria) (gcc version 
5.2.1 20150808 (Debian 5.2.1-15) ) #11 SMP PREEMPT Sun Aug 16 07:27:00 
ACST 2015
...
[ 6000.096107] INFO: task basename:7796 blocked for more than 120 seconds.
[ 6000.096116]       Not tainted 4.2.0-rc6+ #11
[ 6000.096120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 6000.096123] basename        D e7b5b180     0  7796   6936 0x00000000
[ 6000.096132]  c0379a84 00000086 c11127a5 e7b5b180 e7b5b5ec 2e0a5fb9 
00000557 f5f0b310
[ 6000.096143]  f330b180 e7b5b180 c037a000 f5f0b300 7fffffff c0379a90 
c155b740 00000000
[ 6000.096154]  c0379b04 c155fa1d 00000046 c11127a5 00000246 00000000 
c0379ab0 c10a569b
[ 6000.096164] Call Trace:
[ 6000.096174]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
[ 6000.096179]  [<c155b740>] schedule+0x30/0x80
[ 6000.096184]  [<c155fa1d>] schedule_timeout+0x2cd/0x5c0
[ 6000.096188]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
[ 6000.096193]  [<c10a569b>] ? trace_hardirqs_on+0xb/0x10
[ 6000.096198]  [<c10d701c>] ? ktime_get+0xac/0x1a0
[ 6000.096202]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
[ 6000.096206]  [<c155ad99>] io_schedule_timeout+0x89/0xf0
[ 6000.096209]  [<c155beb0>] ? bit_wait+0x40/0x40
[ 6000.096213]  [<c155bed5>] bit_wait_io+0x25/0x50
[ 6000.096216]  [<c155bbc9>] __wait_on_bit+0x49/0x70
[ 6000.096219]  [<c155beb0>] ? bit_wait+0x40/0x40
[ 6000.096223]  [<c155bc4d>] out_of_line_wait_on_bit+0x5d/0x70
[ 6000.096226]  [<c155beb0>] ? bit_wait+0x40/0x40
[ 6000.096230]  [<c109b210>] ? autoremove_wake_function+0x40/0x40
[ 6000.096236]  [<c11ed5be>] bh_submit_read+0x7e/0x90
[ 6000.096265]  [<f8321354>] ext4_get_branch+0xa4/0x110 [ext4]
[ 6000.096286]  [<f8321f14>] ext4_ind_map_blocks+0xd4/0xe30 [ext4]
[ 6000.096291]  [<c10a6290>] ? __lock_acquire+0x910/0x16a0
[ 6000.096295]  [<c10a6290>] ? __lock_acquire+0x910/0x16a0
[ 6000.096300]  [<c155ed53>] ? down_read+0x33/0x50
[ 6000.096315]  [<f82ddc9d>] ext4_map_blocks+0x29d/0x4f0 [ext4]
[ 6000.096319]  [<c10a548b>] ? mark_held_locks+0x5b/0x90
[ 6000.096323]  [<c10a55ec>] ? trace_hardirqs_on_caller+0x12c/0x1d0
[ 6000.096337]  [<f82db052>] ? ext4_readpages+0x32/0x40 [ext4]
[ 6000.096358]  [<f832d37b>] ext4_mpage_readpages+0x30b/0x8c0 [ext4]
[ 6000.096372]  [<f82db052>] ? ext4_readpages+0x32/0x40 [ext4]
[ 6000.096377]  [<c1156930>] ? __alloc_pages_nodemask+0x9c0/0xa40
[ 6000.096383]  [<c107ed46>] ? preempt_count_sub+0x26/0x70
[ 6000.096397]  [<f82db052>] ext4_readpages+0x32/0x40 [ext4]
[ 6000.096411]  [<f82db020>] ? do_journal_get_write_access+0xb0/0xb0 [ext4]
[ 6000.096416]  [<c115c376>] __do_page_cache_readahead+0x2e6/0x370
[ 6000.096420]  [<c115c233>] ? __do_page_cache_readahead+0x1a3/0x370
[ 6000.096426]  [<c114fb85>] filemap_fault+0x505/0x570
[ 6000.096430]  [<c117bb6f>] ? __do_fault+0x2f/0x80
[ 6000.096435]  [<c117bb6f>] __do_fault+0x2f/0x80
[ 6000.096439]  [<c1560ca7>] ? _raw_spin_unlock+0x27/0x50
[ 6000.096443]  [<c117f412>] handle_mm_fault+0xb22/0x11d0
[ 6000.096448]  [<c104aa7a>] __do_page_fault+0x16a/0x500
[ 6000.096452]  [<c104ae10>] ? __do_page_fault+0x500/0x500
[ 6000.096456]  [<c104ae31>] do_page_fault+0x21/0x30
[ 6000.096460]  [<c156282b>] error_code+0x5f/0x64
[ 6000.096464]  [<c104ae10>] ? __do_page_fault+0x500/0x500
[ 6000.096468] 2 locks held by basename/7796:
[ 6000.096470]  #0:  (&mm->mmap_sem){++++++}, at: [<c104aa25>] 
__do_page_fault+0x115/0x500
[ 6000.096479]  #1:  (&ei->i_data_sem){++++..}, at: [<f82ddd9b>] 
ext4_map_blocks+0x39b/0x4f0 [ext4]
[ 6000.096500] INFO: task hddtemp:7797 blocked for more than 120 seconds.
[ 6000.096503]       Not tainted 4.2.0-rc6+ #11
[ 6000.096505] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 6000.096508] hddtemp         D e896d100     0  7797   5140 0x00000000
[ 6000.096514]  c02c3a84 00000086 e896d588 e896d100 e896d56c 00000001 
c02c3a84 f5f0b310
[ 6000.096525]  c176fb00 e896d100 c02c4000 f5f0b300 7fffffff c02c3a90 
c155b740 00000000
[ 6000.096535]  c02c3b04 c155fa1d 00000046 c11127a5 00000246 00000000 
c02c3ab0 c10a569b
[ 6000.096546] Call Trace:
[ 6000.096550]  [<c155b740>] schedule+0x30/0x80
[ 6000.096554]  [<c155fa1d>] schedule_timeout+0x2cd/0x5c0
[ 6000.096558]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
[ 6000.096562]  [<c10a569b>] ? trace_hardirqs_on+0xb/0x10
[ 6000.096566]  [<c10d701c>] ? ktime_get+0xac/0x1a0
[ 6000.096569]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
[ 6000.096574]  [<c155ad99>] io_schedule_timeout+0x89/0xf0
[ 6000.096577]  [<c109ad07>] ? prepare_to_wait_exclusive+0x47/0x80
[ 6000.096581]  [<c155beb0>] ? bit_wait+0x40/0x40
[ 6000.096584]  [<c155bed5>] bit_wait_io+0x25/0x50
[ 6000.096587]  [<c155bd12>] __wait_on_bit_lock+0x32/0x80
[ 6000.096591]  [<c155bdbd>] out_of_line_wait_on_bit_lock+0x5d/0x70
[ 6000.096595]  [<c155beb0>] ? bit_wait+0x40/0x40
[ 6000.096598]  [<c109b210>] ? autoremove_wake_function+0x40/0x40
[ 6000.096602]  [<c11ea166>] bh_uptodate_or_lock+0x66/0x70
[ 6000.096623]  [<f8321349>] ext4_get_branch+0x99/0x110 [ext4]
[ 6000.096643]  [<f8321f14>] ext4_ind_map_blocks+0xd4/0xe30 [ext4]
[ 6000.096647]  [<c10a6290>] ? __lock_acquire+0x910/0x16a0
[ 6000.096651]  [<c10a6290>] ? __lock_acquire+0x910/0x16a0
[ 6000.096656]  [<c155ed53>] ? down_read+0x33/0x50
[ 6000.096671]  [<f82ddc9d>] ext4_map_blocks+0x29d/0x4f0 [ext4]
[ 6000.096675]  [<c10a548b>] ? mark_held_locks+0x5b/0x90
[ 6000.096679]  [<c10a55ec>] ? trace_hardirqs_on_caller+0x12c/0x1d0
[ 6000.096693]  [<f82db052>] ? ext4_readpages+0x32/0x40 [ext4]
[ 6000.096713]  [<f832d37b>] ext4_mpage_readpages+0x30b/0x8c0 [ext4]
[ 6000.096727]  [<f82db052>] ? ext4_readpages+0x32/0x40 [ext4]
[ 6000.096732]  [<c1156930>] ? __alloc_pages_nodemask+0x9c0/0xa40
[ 6000.096747]  [<f82db052>] ext4_readpages+0x32/0x40 [ext4]
[ 6000.096761]  [<f82db020>] ? do_journal_get_write_access+0xb0/0xb0 [ext4]
[ 6000.096766]  [<c115c376>] __do_page_cache_readahead+0x2e6/0x370
[ 6000.096770]  [<c115c233>] ? __do_page_cache_readahead+0x1a3/0x370
[ 6000.096775]  [<c114fb85>] filemap_fault+0x505/0x570
[ 6000.096779]  [<c117bb6f>] ? __do_fault+0x2f/0x80
[ 6000.096783]  [<c117bb6f>] __do_fault+0x2f/0x80
[ 6000.096787]  [<c1560ca7>] ? _raw_spin_unlock+0x27/0x50
[ 6000.096791]  [<c117f412>] handle_mm_fault+0xb22/0x11d0
[ 6000.096796]  [<c104aa7a>] __do_page_fault+0x16a/0x500
[ 6000.096800]  [<c104ae10>] ? __do_page_fault+0x500/0x500
[ 6000.096803]  [<c104ae31>] do_page_fault+0x21/0x30
[ 6000.096807]  [<c156282b>] error_code+0x5f/0x64
[ 6000.096811]  [<c104ae10>] ? __do_page_fault+0x500/0x500
[ 6000.096815] 2 locks held by hddtemp/7797:
[ 6000.096817]  #0:  (&mm->mmap_sem){++++++}, at: [<c104aa25>] 
__do_page_fault+0x115/0x500
[ 6000.096825]  #1:  (&ei->i_data_sem){++++..}, at: [<f82ddd9b>] 
ext4_map_blocks+0x39b/0x4f0 [ext4]


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov Aug. 17, 2015, 11:35 a.m. UTC | #3
On 08/16, Arthur Marsh wrote:
>
> Would these patches address what I've seen in the last day or so using
> Linus' git head kernel and seeing problems like:

No, this series shouldn't make any difference.



> [    0.000000] Linux version 4.2.0-rc6+ (root@victoria) (gcc version
> 5.2.1 20150808 (Debian 5.2.1-15) ) #11 SMP PREEMPT Sun Aug 16 07:27:00
> ACST 2015
> ...
> [ 6000.096107] INFO: task basename:7796 blocked for more than 120 seconds.
> [ 6000.096116]       Not tainted 4.2.0-rc6+ #11
> [ 6000.096120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 6000.096123] basename        D e7b5b180     0  7796   6936 0x00000000
> [ 6000.096132]  c0379a84 00000086 c11127a5 e7b5b180 e7b5b5ec 2e0a5fb9
> 00000557 f5f0b310
> [ 6000.096143]  f330b180 e7b5b180 c037a000 f5f0b300 7fffffff c0379a90
> c155b740 00000000
> [ 6000.096154]  c0379b04 c155fa1d 00000046 c11127a5 00000246 00000000
> c0379ab0 c10a569b
> [ 6000.096164] Call Trace:
> [ 6000.096174]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
> [ 6000.096179]  [<c155b740>] schedule+0x30/0x80
> [ 6000.096184]  [<c155fa1d>] schedule_timeout+0x2cd/0x5c0
> [ 6000.096188]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
> [ 6000.096193]  [<c10a569b>] ? trace_hardirqs_on+0xb/0x10
> [ 6000.096198]  [<c10d701c>] ? ktime_get+0xac/0x1a0
> [ 6000.096202]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
> [ 6000.096206]  [<c155ad99>] io_schedule_timeout+0x89/0xf0
> [ 6000.096209]  [<c155beb0>] ? bit_wait+0x40/0x40
> [ 6000.096213]  [<c155bed5>] bit_wait_io+0x25/0x50
> [ 6000.096216]  [<c155bbc9>] __wait_on_bit+0x49/0x70
> [ 6000.096219]  [<c155beb0>] ? bit_wait+0x40/0x40
> [ 6000.096223]  [<c155bc4d>] out_of_line_wait_on_bit+0x5d/0x70
> [ 6000.096226]  [<c155beb0>] ? bit_wait+0x40/0x40
> [ 6000.096230]  [<c109b210>] ? autoremove_wake_function+0x40/0x40
> [ 6000.096236]  [<c11ed5be>] bh_submit_read+0x7e/0x90
> [ 6000.096265]  [<f8321354>] ext4_get_branch+0xa4/0x110 [ext4]
> [ 6000.096286]  [<f8321f14>] ext4_ind_map_blocks+0xd4/0xe30 [ext4]
> [ 6000.096291]  [<c10a6290>] ? __lock_acquire+0x910/0x16a0
> [ 6000.096295]  [<c10a6290>] ? __lock_acquire+0x910/0x16a0
> [ 6000.096300]  [<c155ed53>] ? down_read+0x33/0x50
> [ 6000.096315]  [<f82ddc9d>] ext4_map_blocks+0x29d/0x4f0 [ext4]
> [ 6000.096319]  [<c10a548b>] ? mark_held_locks+0x5b/0x90
> [ 6000.096323]  [<c10a55ec>] ? trace_hardirqs_on_caller+0x12c/0x1d0
> [ 6000.096337]  [<f82db052>] ? ext4_readpages+0x32/0x40 [ext4]
> [ 6000.096358]  [<f832d37b>] ext4_mpage_readpages+0x30b/0x8c0 [ext4]
> [ 6000.096372]  [<f82db052>] ? ext4_readpages+0x32/0x40 [ext4]
> [ 6000.096377]  [<c1156930>] ? __alloc_pages_nodemask+0x9c0/0xa40
> [ 6000.096383]  [<c107ed46>] ? preempt_count_sub+0x26/0x70
> [ 6000.096397]  [<f82db052>] ext4_readpages+0x32/0x40 [ext4]
> [ 6000.096411]  [<f82db020>] ? do_journal_get_write_access+0xb0/0xb0 [ext4]
> [ 6000.096416]  [<c115c376>] __do_page_cache_readahead+0x2e6/0x370
> [ 6000.096420]  [<c115c233>] ? __do_page_cache_readahead+0x1a3/0x370
> [ 6000.096426]  [<c114fb85>] filemap_fault+0x505/0x570
> [ 6000.096430]  [<c117bb6f>] ? __do_fault+0x2f/0x80
> [ 6000.096435]  [<c117bb6f>] __do_fault+0x2f/0x80
> [ 6000.096439]  [<c1560ca7>] ? _raw_spin_unlock+0x27/0x50
> [ 6000.096443]  [<c117f412>] handle_mm_fault+0xb22/0x11d0
> [ 6000.096448]  [<c104aa7a>] __do_page_fault+0x16a/0x500
> [ 6000.096452]  [<c104ae10>] ? __do_page_fault+0x500/0x500
> [ 6000.096456]  [<c104ae31>] do_page_fault+0x21/0x30
> [ 6000.096460]  [<c156282b>] error_code+0x5f/0x64
> [ 6000.096464]  [<c104ae10>] ? __do_page_fault+0x500/0x500
> [ 6000.096468] 2 locks held by basename/7796:
> [ 6000.096470]  #0:  (&mm->mmap_sem){++++++}, at: [<c104aa25>]
> __do_page_fault+0x115/0x500
> [ 6000.096479]  #1:  (&ei->i_data_sem){++++..}, at: [<f82ddd9b>]
> ext4_map_blocks+0x39b/0x4f0 [ext4]
> [ 6000.096500] INFO: task hddtemp:7797 blocked for more than 120 seconds.
> [ 6000.096503]       Not tainted 4.2.0-rc6+ #11
> [ 6000.096505] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 6000.096508] hddtemp         D e896d100     0  7797   5140 0x00000000
> [ 6000.096514]  c02c3a84 00000086 e896d588 e896d100 e896d56c 00000001
> c02c3a84 f5f0b310
> [ 6000.096525]  c176fb00 e896d100 c02c4000 f5f0b300 7fffffff c02c3a90
> c155b740 00000000
> [ 6000.096535]  c02c3b04 c155fa1d 00000046 c11127a5 00000246 00000000
> c02c3ab0 c10a569b
> [ 6000.096546] Call Trace:
> [ 6000.096550]  [<c155b740>] schedule+0x30/0x80
> [ 6000.096554]  [<c155fa1d>] schedule_timeout+0x2cd/0x5c0
> [ 6000.096558]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
> [ 6000.096562]  [<c10a569b>] ? trace_hardirqs_on+0xb/0x10
> [ 6000.096566]  [<c10d701c>] ? ktime_get+0xac/0x1a0
> [ 6000.096569]  [<c11127a5>] ? __delayacct_blkio_start+0x15/0x20
> [ 6000.096574]  [<c155ad99>] io_schedule_timeout+0x89/0xf0
> [ 6000.096577]  [<c109ad07>] ? prepare_to_wait_exclusive+0x47/0x80
> [ 6000.096581]  [<c155beb0>] ? bit_wait+0x40/0x40
> [ 6000.096584]  [<c155bed5>] bit_wait_io+0x25/0x50
> [ 6000.096587]  [<c155bd12>] __wait_on_bit_lock+0x32/0x80
> [ 6000.096591]  [<c155bdbd>] out_of_line_wait_on_bit_lock+0x5d/0x70
> [ 6000.096595]  [<c155beb0>] ? bit_wait+0x40/0x40
> [ 6000.096598]  [<c109b210>] ? autoremove_wake_function+0x40/0x40
> [ 6000.096602]  [<c11ea166>] bh_uptodate_or_lock+0x66/0x70
> [ 6000.096623]  [<f8321349>] ext4_get_branch+0x99/0x110 [ext4]
> [ 6000.096643]  [<f8321f14>] ext4_ind_map_blocks+0xd4/0xe30 [ext4]
> [ 6000.096647]  [<c10a6290>] ? __lock_acquire+0x910/0x16a0
> [ 6000.096651]  [<c10a6290>] ? __lock_acquire+0x910/0x16a0
> [ 6000.096656]  [<c155ed53>] ? down_read+0x33/0x50
> [ 6000.096671]  [<f82ddc9d>] ext4_map_blocks+0x29d/0x4f0 [ext4]
> [ 6000.096675]  [<c10a548b>] ? mark_held_locks+0x5b/0x90
> [ 6000.096679]  [<c10a55ec>] ? trace_hardirqs_on_caller+0x12c/0x1d0
> [ 6000.096693]  [<f82db052>] ? ext4_readpages+0x32/0x40 [ext4]
> [ 6000.096713]  [<f832d37b>] ext4_mpage_readpages+0x30b/0x8c0 [ext4]
> [ 6000.096727]  [<f82db052>] ? ext4_readpages+0x32/0x40 [ext4]
> [ 6000.096732]  [<c1156930>] ? __alloc_pages_nodemask+0x9c0/0xa40
> [ 6000.096747]  [<f82db052>] ext4_readpages+0x32/0x40 [ext4]
> [ 6000.096761]  [<f82db020>] ? do_journal_get_write_access+0xb0/0xb0 [ext4]
> [ 6000.096766]  [<c115c376>] __do_page_cache_readahead+0x2e6/0x370
> [ 6000.096770]  [<c115c233>] ? __do_page_cache_readahead+0x1a3/0x370
> [ 6000.096775]  [<c114fb85>] filemap_fault+0x505/0x570
> [ 6000.096779]  [<c117bb6f>] ? __do_fault+0x2f/0x80
> [ 6000.096783]  [<c117bb6f>] __do_fault+0x2f/0x80
> [ 6000.096787]  [<c1560ca7>] ? _raw_spin_unlock+0x27/0x50
> [ 6000.096791]  [<c117f412>] handle_mm_fault+0xb22/0x11d0
> [ 6000.096796]  [<c104aa7a>] __do_page_fault+0x16a/0x500
> [ 6000.096800]  [<c104ae10>] ? __do_page_fault+0x500/0x500
> [ 6000.096803]  [<c104ae31>] do_page_fault+0x21/0x30
> [ 6000.096807]  [<c156282b>] error_code+0x5f/0x64
> [ 6000.096811]  [<c104ae10>] ? __do_page_fault+0x500/0x500
> [ 6000.096815] 2 locks held by hddtemp/7797:
> [ 6000.096817]  #0:  (&mm->mmap_sem){++++++}, at: [<c104aa25>]
> __do_page_fault+0x115/0x500
> [ 6000.096825]  #1:  (&ei->i_data_sem){++++..}, at: [<f82ddd9b>]
> ext4_map_blocks+0x39b/0x4f0 [ext4]
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/super.c b/fs/super.c
index 4350ff4..91c9756 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1213,25 +1213,34 @@  EXPORT_SYMBOL(__sb_start_write);
 static void sb_wait_write(struct super_block *sb, int level)
 {
 	percpu_down_write(sb->s_writers.rw_sem + level-1);
-	/*
-	 * We are going to return to userspace and forget about this lock, the
-	 * ownership goes to the caller of thaw_super() which does unlock.
-	 *
-	 * FIXME: we should do this before return from freeze_super() after we
-	 * called sync_filesystem(sb) and s_op->freeze_fs(sb), and thaw_super()
-	 * should re-acquire these locks before s_op->unfreeze_fs(sb). However
-	 * this leads to lockdep false-positives, so currently we do the early
-	 * release right after acquire.
-	 */
-	percpu_rwsem_release(sb->s_writers.rw_sem + level-1, 0, _THIS_IP_);
 }
 
-static void sb_freeze_unlock(struct super_block *sb)
+/*
+ * We are going to return to userspace and forget about these locks, the
+ * ownership goes to the caller of thaw_super() which does unlock().
+ */
+static void sb_freeze_release(struct super_block *sb)
+{
+	int level;
+
+	for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
+		percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
+}
+
+/*
+ * Tell lockdep we are holding these locks before we call ->unfreeze_fs(sb).
+ */
+static void sb_freeze_acquire(struct super_block *sb)
 {
 	int level;
 
 	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
 		percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
+}
+
+static void sb_freeze_unlock(struct super_block *sb)
+{
+	int level;
 
 	for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
 		percpu_up_write(sb->s_writers.rw_sem + level);
@@ -1327,6 +1336,7 @@  int freeze_super(struct super_block *sb)
 	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+	sb_freeze_release(sb);
 	up_write(&sb->s_umount);
 	return 0;
 }
@@ -1353,11 +1363,14 @@  int thaw_super(struct super_block *sb)
 		goto out;
 	}
 
+	sb_freeze_acquire(sb);
+
 	if (sb->s_op->unfreeze_fs) {
 		error = sb->s_op->unfreeze_fs(sb);
 		if (error) {
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
+			sb_freeze_release(sb);
 			up_write(&sb->s_umount);
 			return error;
 		}