diff mbox

"mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Message ID CAMuHMdUJSfrZ=2zy88_zojDek3CHEWKhv_qoJAVgDpPWz8V=Ew@mail.gmail.com (mailing list archive)
State Not Applicable
Delegated to: Geert Uytterhoeven
Headers show

Commit Message

Geert Uytterhoeven April 18, 2017, 7:56 p.m. UTC
Hi all,

On Sat, Apr 8, 2017 at 10:48 AM, Linux Kernel Mailing List
<linux-kernel@vger.kernel.org> wrote:
> Web:        https://git.kernel.org/torvalds/c/ce612879ddc78ea7e4de4be80cba4ebf9caa07ee
> Commit:     ce612879ddc78ea7e4de4be80cba4ebf9caa07ee
> Parent:     cdcf4330d5660998d06fcd899b443693ab3d652f
> Refname:    refs/heads/master
> Author:     Michal Hocko <mhocko@suse.com>
> AuthorDate: Fri Apr 7 16:05:05 2017 -0700
> Committer:  Linus Torvalds <torvalds@linux-foundation.org>
> CommitDate: Sat Apr 8 00:47:49 2017 -0700
>
>     mm: move pcp and lru-pcp draining into single wq
>
>     We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
>     vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
>     per cpu lru caches.  This seems more than necessary because both can run
>     on a single WQ.  Both do not block on locks requiring a memory
>     allocation nor perform any allocations themselves.  We will save one
>     rescuer thread this way.
>
>     On the other hand drain_all_pages() queues work on the system wq which
>     doesn't have rescuer and so this depend on memory allocation (when all
>     workers are stuck allocating and new ones cannot be created).
>
>     Initially we thought this would be more of a theoretical problem but
>     Hugh Dickins has reported:
>
>     : 4.11-rc has been giving me hangs after hours of swapping load.  At
>     : first they looked like memory leaks ("fork: Cannot allocate memory");
>     : but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
>     : before looking at /proc/meminfo one time, and the stat_refresh stuck
>     : in D state, waiting for completion of flush_work like many kworkers.
>     : kthreadd waiting for completion of flush_work in drain_all_pages().
>
>     This worker should be using WQ_RECLAIM as well in order to guarantee a
>     forward progress.  We can reuse the same one as for lru draining and
>     vmstat.
>
>     Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org
>     Signed-off-by: Michal Hocko <mhocko@suse.com>
>     Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>     Acked-by: Vlastimil Babka <vbabka@suse.cz>
>     Acked-by: Mel Gorman <mgorman@suse.de>
>     Tested-by: Yang Li <pku.leo@gmail.com>
>     Tested-by: Hugh Dickins <hughd@google.com>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

This commit broke resume from s2ram on some of my Renesas ARM boards.
On some boards the hang is 100% reproducible, on others it's intermittent
(which was a PITA, as I had to bisect another independent s2ram bug as well).

On r8a7791/koelsch:

Comments

Michal Hocko April 18, 2017, 8:19 p.m. UTC | #1
On Tue 18-04-17 21:56:56, Geert Uytterhoeven wrote:
> Hi all,
> 
> On Sat, Apr 8, 2017 at 10:48 AM, Linux Kernel Mailing List
> <linux-kernel@vger.kernel.org> wrote:
> > Web:        https://git.kernel.org/torvalds/c/ce612879ddc78ea7e4de4be80cba4ebf9caa07ee
> > Commit:     ce612879ddc78ea7e4de4be80cba4ebf9caa07ee
> > Parent:     cdcf4330d5660998d06fcd899b443693ab3d652f
> > Refname:    refs/heads/master
> > Author:     Michal Hocko <mhocko@suse.com>
> > AuthorDate: Fri Apr 7 16:05:05 2017 -0700
> > Committer:  Linus Torvalds <torvalds@linux-foundation.org>
> > CommitDate: Sat Apr 8 00:47:49 2017 -0700
> >
> >     mm: move pcp and lru-pcp draining into single wq
> >
> >     We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
> >     vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
> >     per cpu lru caches.  This seems more than necessary because both can run
> >     on a single WQ.  Both do not block on locks requiring a memory
> >     allocation nor perform any allocations themselves.  We will save one
> >     rescuer thread this way.
> >
> >     On the other hand drain_all_pages() queues work on the system wq which
> >     doesn't have rescuer and so this depend on memory allocation (when all
> >     workers are stuck allocating and new ones cannot be created).
> >
> >     Initially we thought this would be more of a theoretical problem but
> >     Hugh Dickins has reported:
> >
> >     : 4.11-rc has been giving me hangs after hours of swapping load.  At
> >     : first they looked like memory leaks ("fork: Cannot allocate memory");
> >     : but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
> >     : before looking at /proc/meminfo one time, and the stat_refresh stuck
> >     : in D state, waiting for completion of flush_work like many kworkers.
> >     : kthreadd waiting for completion of flush_work in drain_all_pages().
> >
> >     This worker should be using WQ_RECLAIM as well in order to guarantee a
> >     forward progress.  We can reuse the same one as for lru draining and
> >     vmstat.
> >
> >     Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org
> >     Signed-off-by: Michal Hocko <mhocko@suse.com>
> >     Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >     Acked-by: Vlastimil Babka <vbabka@suse.cz>
> >     Acked-by: Mel Gorman <mgorman@suse.de>
> >     Tested-by: Yang Li <pku.leo@gmail.com>
> >     Tested-by: Hugh Dickins <hughd@google.com>
> >     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> This commit broke resume from s2ram on some of my Renesas ARM boards.
> On some boards the hang is 100% reproducible, on others it's intermittent
> (which was a PITA, as I had to bisect another independent s2ram bug as well).

Hmm, I am rather confused, how the above commit could change anything
here. Your lockup detector is hitting
dpm_wait_for_superior
        dpm_wait(dev->parent, async);
	dpm_wait_for_suppliers(dev, async);

which in turn waits wait_for_completion(&dev->power.completion)

the above commit has reduced the load on the system WQ. It also removed
one WQ and reused the existing one. The work done on the mm_percpu_wq
doesn't block so I suspect that what you are seeing is just showing a
real bug somewhere else. I will have a look tomorrow. Let's add Tejun,
maybe I have introduced some subtle dependency, which is not clear to
me.

> 
> On r8a7791/koelsch:
> 
> --- /tmp/good   2017-04-18 21:47:04.457156167 +0200
> +++ /tmp/bad    2017-04-18 21:43:26.215240325 +0200
> @@ -13,11 +13,178 @@ Enabling non-boot CPUs ...
>  CPU1 is up
>  PM: noirq resume of devices complete after N.N msecs
>  PM: early resume of devices complete after N.N msecs
> -Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver
> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01,
> irq=-1)
> -PM: resume of devices complete after N.N msecs
> -PM: resume devices took N.N seconds
> -PM: Finishing wakeup.
> -Restarting tasks ... done.
> -ata1: link resume succeeded after 1 retries
> -ata1: SATA link down (SStatus 0 SControl 300)
> -sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> +INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
> +      Not tainted 4.11.0-rc7-koelsch-00426-g70412b99f7936b37 #3470
> +"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> +kworker/u4:0    D    0     5      2 0x00000000
> +Workqueue: events_unbound async_run_entry_fn
> +[<c0712c58>] (__schedule) from [<c0712f7c>] (schedule+0xb0/0xcc)
> +[<c0712f7c>] (schedule) from [<c0717140>] (schedule_timeout+0x18/0x1f4)
> +[<c0717140>] (schedule_timeout) from [<c07139f8>] (wait_for_common+0x100/0x19c)
> +[<c07139f8>] (wait_for_common) from [<c04d8488>]
> (dpm_wait_for_superior+0x14/0x5c)
> +[<c04d8488>] (dpm_wait_for_superior) from [<c04d8aa4>]
> (device_resume+0x40/0x1a0)
> +[<c04d8aa4>] (device_resume) from [<c04d8c1c>] (async_resume+0x18/0x44)
> +[<c04d8c1c>] (async_resume) from [<c023db34>] (async_run_entry_fn+0x44/0x114)
> +[<c023db34>] (async_run_entry_fn) from [<c0236544>]
> (process_one_work+0x1cc/0x31c)
> +[<c0236544>] (process_one_work) from [<c0236ca0>] (worker_thread+0x2b8/0x3f0)
> +[<c0236ca0>] (worker_thread) from [<c023b240>] (kthread+0x120/0x140)
> +[<c023b240>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c)
> +INFO: task kworker/u4:1:125 blocked for more than 120 seconds.
> +      Not tainted 4.11.0-rc7-koelsch-00426-g70412b99f7936b37 #3470
> +"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> +kworker/u4:1    D    0   125      2 0x00000000
> +Workqueue: events_unbound async_run_entry_fn
> +[<c0712c58>] (__schedule) from [<c0712f7c>] (schedule+0xb0/0xcc)
> +[<c0712f7c>] (schedule) from [<c0717140>] (schedule_timeout+0x18/0x1f4)
> +[<c0717140>] (schedule_timeout) from [<c07139f8>] (wait_for_common+0x100/0x19c)
> +[<c07139f8>] (wait_for_common) from [<c04d8488>]
> (dpm_wait_for_superior+0x14/0x5c)
> +[<c04d8488>] (dpm_wait_for_superior) from [<c04d8aa4>]
> (device_resume+0x40/0x1a0)
> +[<c04d8aa4>] (device_resume) from [<c04d8c1c>] (async_resume+0x18/0x44)
> +[<c04d8c1c>] (async_resume) from [<c023db34>] (async_run_entry_fn+0x44/0x114)
> +[<c023db34>] (async_run_entry_fn) from [<c0236544>]
> (process_one_work+0x1cc/0x31c)
> +[<c0236544>] (process_one_work) from [<c0236ca0>] (worker_thread+0x2b8/0x3f0)
> +[<c0236ca0>] (worker_thread) from [<c023b240>] (kthread+0x120/0x140)
> +[<c023b240>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c)
> ...
> 
> On r8a7795/salvator-x, where I have working lockdep:
> 
> PM: noirq resume of devices complete after 131.415 msecs
> PM: early resume of devices complete after 8.894 msecs
> INFO: task kworker/u16:2:276 blocked for more than 120 seconds.
>       Not tainted 4.11.0-rc7-salvator-x-06706-g70412b99f7936b37 #1220
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u16:2   D    0   276      2 0x00000000
> Workqueue: events_unbound async_run_entry_fn
> Call trace:
> [<ffffff800808612c>] __switch_to+0xa0/0xac
> [<ffffff800866df14>] __schedule+0x70c/0xb88
> [<ffffff800866e414>] schedule+0x84/0xa4
> [<ffffff800867287c>] schedule_timeout+0x30/0x400
> [<ffffff800866ee94>] wait_for_common+0x164/0x1a8
> [<ffffff800866eeec>] wait_for_completion+0x14/0x1c
> [<ffffff80083d3f84>] dpm_wait+0x30/0x38
> [<ffffff80083d3ff8>] dpm_wait_for_superior+0x28/0x7c
> [<ffffff80083d490c>] device_resume+0x44/0x190
> [<ffffff80083d4a7c>] async_resume+0x24/0x54
> [<ffffff80080d5360>] async_run_entry_fn+0x4c/0x12c
> [<ffffff80080cb59c>] process_one_work+0x340/0x66c
> [<ffffff80080cc9ec>] worker_thread+0x274/0x39c
> [<ffffff80080d2004>] kthread+0x120/0x128
> [<ffffff8008083090>] ret_from_fork+0x10/0x40
> 
> Showing all locks held in the system:
> 2 locks held by khungtaskd/52:
>  #0:  (rcu_read_lock){......}, at: [<ffffff80081436cc>] watchdog+0xc0/0x618
>  #1:  (tasklist_lock){.+.+..}, at: [<ffffff80080fb5f0>]
> debug_show_all_locks+0x68/0x18c
> 2 locks held by kworker/u16:2/276:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:3/291:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 8 locks held by s2ram/1899:
>  #0:  (sb_writers#7){.+.+.+}, at: [<ffffff80081ca1a4>] vfs_write+0xa8/0x15c
>  #1:  (&of->mutex){+.+.+.}, at: [<ffffff8008245964>] kernfs_fop_write+0xf0/0x194
>  #2:  (s_active#48){.+.+.+}, at: [<ffffff800824596c>]
> kernfs_fop_write+0xf8/0x194
>  #3:  (pm_mutex){+.+.+.}, at: [<ffffff80081059a4>] pm_suspend+0x16c/0xabc
>  #4:  (&dev->mutex){......}, at: [<ffffff80083d4920>] device_resume+0x58/0x190
>  #5:  (cma_mutex){+.+...}, at: [<ffffff80081c516c>] cma_alloc+0x150/0x374
>  #6:  (lock){+.+...}, at: [<ffffff800818b8ec>] lru_add_drain_all+0x4c/0x1b4
>  #7:  (cpu_hotplug.dep_map){++++++}, at: [<ffffff80080ab8f4>]
> get_online_cpus+0x3c/0x9c
> 2 locks held by kworker/u16:1/1918:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:4/1919:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:5/1920:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:7/1922:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:9/1924:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:10/1925:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:11/1926:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:12/1927:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:13/1928:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:14/1929:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 2 locks held by kworker/u16:16/1931:
>  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
>  #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
> process_one_work+0x1c8/0x66c
> 
> Thanks for your comments!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
Tetsuo Handa April 19, 2017, 5:41 a.m. UTC | #2
Geert Uytterhoeven wrote:
> 8 locks held by s2ram/1899:
>  #0:  (sb_writers#7){.+.+.+}, at: [<ffffff80081ca1a4>] vfs_write+0xa8/0x15c
>  #1:  (&of->mutex){+.+.+.}, at: [<ffffff8008245964>] kernfs_fop_write+0xf0/0x194
>  #2:  (s_active#48){.+.+.+}, at: [<ffffff800824596c>] kernfs_fop_write+0xf8/0x194
>  #3:  (pm_mutex){+.+.+.}, at: [<ffffff80081059a4>] pm_suspend+0x16c/0xabc
>  #4:  (&dev->mutex){......}, at: [<ffffff80083d4920>] device_resume+0x58/0x190
>  #5:  (cma_mutex){+.+...}, at: [<ffffff80081c516c>] cma_alloc+0x150/0x374
>  #6:  (lock){+.+...}, at: [<ffffff800818b8ec>] lru_add_drain_all+0x4c/0x1b4
>  #7:  (cpu_hotplug.dep_map){++++++}, at: [<ffffff80080ab8f4>] get_online_cpus+0x3c/0x9c

I think this situation suggests that

int pm_suspend(suspend_state_t state) {
  error = enter_state(state) {
    if (!mutex_trylock(&pm_mutex)) /* #3 */
      return -EBUSY;
    error = suspend_devices_and_enter(state) {
      error = suspend_enter(state, &wakeup) {
        enable_nonboot_cpus() {
          cpu_maps_update_begin() {
            mutex_lock(&cpu_add_remove_lock);
          }
          pr_info("Enabling non-boot CPUs ...\n");
          for_each_cpu(cpu, frozen_cpus) {
            error = _cpu_up(cpu, 1, CPUHP_ONLINE) {
              cpu_hotplug_begin() {
                mutex_lock(&cpu_hotplug.lock);
              }
              
              cpu_hotplug_done() {
                mutex_unlock(&cpu_hotplug.lock);
              }
            }
            if (!error) {
              pr_info("CPU%d is up\n", cpu);
              continue;
            }
          }
          cpu_maps_update_done() {
             mutex_unlock(&cpu_add_remove_lock);
          }
        }
      }
      dpm_resume_end(PMSG_RESUME) {
        dpm_resume(state) {
          mutex_lock(&dpm_list_mtx);
          while (!list_empty(&dpm_suspended_list)) {
            mutex_unlock(&dpm_list_mtx);
            error = device_resume(dev, state, false) {
              dpm_wait_for_superior(dev, async);
              dpm_watchdog_set(&wd, dev);
              device_lock(dev) {
                mutex_lock(&dev->mutex); /* #4 */
              }
              error = dpm_run_callback(callback, dev, state, info) {
                cma_alloc() {
                  mutex_lock(&cma_mutex); /* #5 */
                  alloc_contig_range() {
                    lru_add_drain_all() {
                      mutex_lock(&lock); /* #6 */
                      get_online_cpus() {
                        mutex_lock(&cpu_hotplug.lock); /* #7 hang? */
                        mutex_unlock(&cpu_hotplug.lock);
                      }
                      put_online_cpus();
                      mutex_unlock(&lock); /* #6 */
                    }
                  }
                  mutex_unlock(&cma_mutex); /* #5 */
                }
              }
              device_unlock(dev) {
                mutex_unlock(&dev->mutex); /* #4 */
              }
            }
            mutex_lock(&dpm_list_mtx);
          }
          mutex_unlock(&dpm_list_mtx);
        }
        dpm_complete(state) {
          mutex_lock(&dpm_list_mtx);
          while (!list_empty(&dpm_prepared_list)) {
            mutex_unlock(&dpm_list_mtx);
            device_complete(dev, state) {
            }
            mutex_lock(&dpm_list_mtx);
          }
          mutex_unlock(&dpm_list_mtx);
        }
      }
    }
    mutex_unlock(&pm_mutex); /* #3 */
  }
}

Somebody is waiting forever with cpu_hotplug.lock held?
I think that full dmesg with SysRq-t output is appreciated.
Michal Hocko April 19, 2017, 7:10 a.m. UTC | #3
On Wed 19-04-17 14:41:30, Tetsuo Handa wrote:
[...]
> Somebody is waiting forever with cpu_hotplug.lock held?

Why would that matter for drain_all_pages? It doesn't use
get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on
cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move
pcp and lru-pcp draining into single wq") was merged later.
Geert Uytterhoeven April 19, 2017, 7:16 a.m. UTC | #4
Hi Tetsuo,

On Wed, Apr 19, 2017 at 7:41 AM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> Geert Uytterhoeven wrote:
>> 8 locks held by s2ram/1899:
>>  #0:  (sb_writers#7){.+.+.+}, at: [<ffffff80081ca1a4>] vfs_write+0xa8/0x15c
>>  #1:  (&of->mutex){+.+.+.}, at: [<ffffff8008245964>] kernfs_fop_write+0xf0/0x194
>>  #2:  (s_active#48){.+.+.+}, at: [<ffffff800824596c>] kernfs_fop_write+0xf8/0x194
>>  #3:  (pm_mutex){+.+.+.}, at: [<ffffff80081059a4>] pm_suspend+0x16c/0xabc
>>  #4:  (&dev->mutex){......}, at: [<ffffff80083d4920>] device_resume+0x58/0x190
>>  #5:  (cma_mutex){+.+...}, at: [<ffffff80081c516c>] cma_alloc+0x150/0x374
>>  #6:  (lock){+.+...}, at: [<ffffff800818b8ec>] lru_add_drain_all+0x4c/0x1b4
>>  #7:  (cpu_hotplug.dep_map){++++++}, at: [<ffffff80080ab8f4>] get_online_cpus+0x3c/0x9c
>
> I think this situation suggests that

[...]

> Somebody is waiting forever with cpu_hotplug.lock held?
> I think that full dmesg with SysRq-t output is appreciated.

As SysRq doesn't work with my serial console, I added calls to show_state()
and show_workqueue_state() to check_hung_task().

Result with current linus/master attached.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Tetsuo Handa April 19, 2017, 7:26 a.m. UTC | #5
> On Wed 19-04-17 14:41:30, Tetsuo Handa wrote:
> [...]
> > Somebody is waiting forever with cpu_hotplug.lock held?
> 
> Why would that matter for drain_all_pages? It doesn't use
> get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on
> cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move
> pcp and lru-pcp draining into single wq") was merged later.
> 

Looking at ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq"),
we merged "lru-add-drain" (!WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue and
"vmstat" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue into
"mm_percpu_wq" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue.

-       lru_add_drain_wq = alloc_workqueue("lru-add-drain", WQ_MEM_RECLAIM, 0);
-       vmstat_wq = alloc_workqueue("vmstat", WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
+       mm_percpu_wq = alloc_workqueue("mm_percpu_wq",
+                                      WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);

That means "lru-add-drain" became freezable, doesn't it? And this problem
occurs around resume operation where all freezable threads are frozen?
Then, lru_add_drain_per_cpu() cannot be performed due to mm_percpu_wq frozen?
diff mbox

Patch

--- /tmp/good   2017-04-18 21:47:04.457156167 +0200
+++ /tmp/bad    2017-04-18 21:43:26.215240325 +0200
@@ -13,11 +13,178 @@  Enabling non-boot CPUs ...
 CPU1 is up
 PM: noirq resume of devices complete after N.N msecs
 PM: early resume of devices complete after N.N msecs
-Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver
[Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01,
irq=-1)
-PM: resume of devices complete after N.N msecs
-PM: resume devices took N.N seconds
-PM: Finishing wakeup.
-Restarting tasks ... done.
-ata1: link resume succeeded after 1 retries
-ata1: SATA link down (SStatus 0 SControl 300)
-sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
+INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
+      Not tainted 4.11.0-rc7-koelsch-00426-g70412b99f7936b37 #3470
+"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
+kworker/u4:0    D    0     5      2 0x00000000
+Workqueue: events_unbound async_run_entry_fn
+[<c0712c58>] (__schedule) from [<c0712f7c>] (schedule+0xb0/0xcc)
+[<c0712f7c>] (schedule) from [<c0717140>] (schedule_timeout+0x18/0x1f4)
+[<c0717140>] (schedule_timeout) from [<c07139f8>] (wait_for_common+0x100/0x19c)
+[<c07139f8>] (wait_for_common) from [<c04d8488>]
(dpm_wait_for_superior+0x14/0x5c)
+[<c04d8488>] (dpm_wait_for_superior) from [<c04d8aa4>]
(device_resume+0x40/0x1a0)
+[<c04d8aa4>] (device_resume) from [<c04d8c1c>] (async_resume+0x18/0x44)
+[<c04d8c1c>] (async_resume) from [<c023db34>] (async_run_entry_fn+0x44/0x114)
+[<c023db34>] (async_run_entry_fn) from [<c0236544>]
(process_one_work+0x1cc/0x31c)
+[<c0236544>] (process_one_work) from [<c0236ca0>] (worker_thread+0x2b8/0x3f0)
+[<c0236ca0>] (worker_thread) from [<c023b240>] (kthread+0x120/0x140)
+[<c023b240>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c)
+INFO: task kworker/u4:1:125 blocked for more than 120 seconds.
+      Not tainted 4.11.0-rc7-koelsch-00426-g70412b99f7936b37 #3470
+"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
+kworker/u4:1    D    0   125      2 0x00000000
+Workqueue: events_unbound async_run_entry_fn
+[<c0712c58>] (__schedule) from [<c0712f7c>] (schedule+0xb0/0xcc)
+[<c0712f7c>] (schedule) from [<c0717140>] (schedule_timeout+0x18/0x1f4)
+[<c0717140>] (schedule_timeout) from [<c07139f8>] (wait_for_common+0x100/0x19c)
+[<c07139f8>] (wait_for_common) from [<c04d8488>]
(dpm_wait_for_superior+0x14/0x5c)
+[<c04d8488>] (dpm_wait_for_superior) from [<c04d8aa4>]
(device_resume+0x40/0x1a0)
+[<c04d8aa4>] (device_resume) from [<c04d8c1c>] (async_resume+0x18/0x44)
+[<c04d8c1c>] (async_resume) from [<c023db34>] (async_run_entry_fn+0x44/0x114)
+[<c023db34>] (async_run_entry_fn) from [<c0236544>]
(process_one_work+0x1cc/0x31c)
+[<c0236544>] (process_one_work) from [<c0236ca0>] (worker_thread+0x2b8/0x3f0)
+[<c0236ca0>] (worker_thread) from [<c023b240>] (kthread+0x120/0x140)
+[<c023b240>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c)
...

On r8a7795/salvator-x, where I have working lockdep:

PM: noirq resume of devices complete after 131.415 msecs
PM: early resume of devices complete after 8.894 msecs
INFO: task kworker/u16:2:276 blocked for more than 120 seconds.
      Not tainted 4.11.0-rc7-salvator-x-06706-g70412b99f7936b37 #1220
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:2   D    0   276      2 0x00000000
Workqueue: events_unbound async_run_entry_fn
Call trace:
[<ffffff800808612c>] __switch_to+0xa0/0xac
[<ffffff800866df14>] __schedule+0x70c/0xb88
[<ffffff800866e414>] schedule+0x84/0xa4
[<ffffff800867287c>] schedule_timeout+0x30/0x400
[<ffffff800866ee94>] wait_for_common+0x164/0x1a8
[<ffffff800866eeec>] wait_for_completion+0x14/0x1c
[<ffffff80083d3f84>] dpm_wait+0x30/0x38
[<ffffff80083d3ff8>] dpm_wait_for_superior+0x28/0x7c
[<ffffff80083d490c>] device_resume+0x44/0x190
[<ffffff80083d4a7c>] async_resume+0x24/0x54
[<ffffff80080d5360>] async_run_entry_fn+0x4c/0x12c
[<ffffff80080cb59c>] process_one_work+0x340/0x66c
[<ffffff80080cc9ec>] worker_thread+0x274/0x39c
[<ffffff80080d2004>] kthread+0x120/0x128
[<ffffff8008083090>] ret_from_fork+0x10/0x40

Showing all locks held in the system:
2 locks held by khungtaskd/52:
 #0:  (rcu_read_lock){......}, at: [<ffffff80081436cc>] watchdog+0xc0/0x618
 #1:  (tasklist_lock){.+.+..}, at: [<ffffff80080fb5f0>]
debug_show_all_locks+0x68/0x18c
2 locks held by kworker/u16:2/276:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:3/291:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
8 locks held by s2ram/1899:
 #0:  (sb_writers#7){.+.+.+}, at: [<ffffff80081ca1a4>] vfs_write+0xa8/0x15c
 #1:  (&of->mutex){+.+.+.}, at: [<ffffff8008245964>] kernfs_fop_write+0xf0/0x194
 #2:  (s_active#48){.+.+.+}, at: [<ffffff800824596c>]
kernfs_fop_write+0xf8/0x194
 #3:  (pm_mutex){+.+.+.}, at: [<ffffff80081059a4>] pm_suspend+0x16c/0xabc
 #4:  (&dev->mutex){......}, at: [<ffffff80083d4920>] device_resume+0x58/0x190
 #5:  (cma_mutex){+.+...}, at: [<ffffff80081c516c>] cma_alloc+0x150/0x374
 #6:  (lock){+.+...}, at: [<ffffff800818b8ec>] lru_add_drain_all+0x4c/0x1b4
 #7:  (cpu_hotplug.dep_map){++++++}, at: [<ffffff80080ab8f4>]
get_online_cpus+0x3c/0x9c
2 locks held by kworker/u16:1/1918:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:4/1919:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:5/1920:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:7/1922:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:9/1924:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:10/1925:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:11/1926:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:12/1927:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:13/1928:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:14/1929:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
2 locks held by kworker/u16:16/1931:
 #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c
 #1:  ((&entry->work)){+.+...}, at: [<ffffff80080cb424>]
process_one_work+0x1c8/0x66c

Thanks for your comments!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds