[0/4,v2] mm/swap: Add locking for pagevec

Message ID	20190424111208.24459-1-bigeasy@linutronix.de (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: pass (google.com: best guess record for domain of bigeasy@linutronix.de designates 2a01:7a0:2:106d:700::1 as permitted sender) client-ip=2a01:7a0:2:106d:700::1; From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: linux-mm@kvack.org Cc: tglx@linutronix.de, frederic@kernel.org, Christoph Lameter <cl@linux.com>, anna-maria@linutronix.de Subject: [PATCH 0/4 v2] mm/swap: Add locking for pagevec Date: Wed, 24 Apr 2019 13:12:04 +0200 Message-Id: <20190424111208.24459-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/swap: Add locking for pagevec \| expand [0/4,v2] mm/swap: Add locking for pagevec [1/4] mm/page_alloc: Split drain_local_pages() [2/4] mm/swap: Add static key dependent pagevec locking [3/4] mm/swap: Access struct pagevec remotely [4/4] mm/swap: Enable "use_pvec_lock" nohz_full dependent

Message ID

20190424111208.24459-1-bigeasy@linutronix.de (mailing list archive)

Headers

Received-SPF: pass (google.com: best guess record for domain of
 bigeasy@linutronix.de designates 2a01:7a0:2:106d:700::1 as permitted sender)
 client-ip=2a01:7a0:2:106d:700::1;
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-mm@kvack.org
Cc: tglx@linutronix.de,
	frederic@kernel.org,
	Christoph Lameter <cl@linux.com>,
	anna-maria@linutronix.de
Subject: [PATCH 0/4 v2] mm/swap: Add locking for pagevec
Date: Wed, 24 Apr 2019 13:12:04 +0200
Message-Id: <20190424111208.24459-1-bigeasy@linutronix.de>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm/swap: Add locking for pagevec | expand

Message

Sebastian Andrzej Siewior April 24, 2019, 11:12 a.m. UTC

The swap code synchronizes its access to the (four) pagevec struct
(which is allocated per-CPU) by disabling preemption. This works and the
one struct needs to be accessed from interrupt context is protected by
disabling interrupts. This was manually audited and there is no lockdep
coverage for this.
There is one case where the per-CPU of a remote CPU needs to be accessed
and this is solved by started a worker on the remote CPU and waiting for
it to finish.

In v1 [0] it was attempted to add per-CPU spinlocks for the access to
struct. This would add lockdep coverage and access from a remote CPU so
the worker wouldn't be required.
It was argued about the cost of the uncontended spin_lock() and that the
benefit of avoiding the per-CPU worker to be rare because it is hardly
used.
A static key has been suggested which enables the per-CPU locking under
certain circumstances like in the NOHZ_FULL case and is implemented as
part of this series.

Sebastian

Comments

Matthew Wilcox April 24, 2019, 12:15 p.m. UTC | #1

On Wed, Apr 24, 2019 at 01:12:04PM +0200, Sebastian Andrzej Siewior wrote:
> The swap code synchronizes its access to the (four) pagevec struct
> (which is allocated per-CPU) by disabling preemption. This works and the
> one struct needs to be accessed from interrupt context is protected by
> disabling interrupts. This was manually audited and there is no lockdep
> coverage for this.
> There is one case where the per-CPU of a remote CPU needs to be accessed
> and this is solved by started a worker on the remote CPU and waiting for
> it to finish.
> 
> In v1 [0] it was attempted to add per-CPU spinlocks for the access to
> struct. This would add lockdep coverage and access from a remote CPU so
> the worker wouldn't be required.

From my point of view, what is missing from this description is why we
want to be able to access these structs from a remote CPU.  It's explained
a little better in the 4/4 changelog, but I don't see any numbers that
suggest what kinds of gains we might see (eg "reduces power consumption
by x% on a particular setup", or even "average length of time in idle
extended from x ms to y ms").

Sebastian Andrzej Siewior April 26, 2019, 8 a.m. UTC | #2

On 2019-04-24 05:15:52 [-0700], Matthew Wilcox wrote:
> On Wed, Apr 24, 2019 at 01:12:04PM +0200, Sebastian Andrzej Siewior wrote:
> > The swap code synchronizes its access to the (four) pagevec struct
> > (which is allocated per-CPU) by disabling preemption. This works and the
> > one struct needs to be accessed from interrupt context is protected by
> > disabling interrupts. This was manually audited and there is no lockdep
> > coverage for this.
> > There is one case where the per-CPU of a remote CPU needs to be accessed
> > and this is solved by started a worker on the remote CPU and waiting for
> > it to finish.
> > 
> > In v1 [0] it was attempted to add per-CPU spinlocks for the access to
> > struct. This would add lockdep coverage and access from a remote CPU so
> > the worker wouldn't be required.
> 
> >From my point of view, what is missing from this description is why we
> want to be able to access these structs from a remote CPU.  It's explained
> a little better in the 4/4 changelog, but I don't see any numbers that
> suggest what kinds of gains we might see (eg "reduces power consumption
> by x% on a particular setup", or even "average length of time in idle
> extended from x ms to y ms").

Pulling out a CPU from idle or userland computation looks bad. In the
first series I had numbers how long it takes to compute the loop for all
per-CPU data from one CPU vs the workqueue. Somehow the uncontended lock
was bad as per krobot report while I never got stable numbers from that
test.
The other motivation is RT where we need proper locking and can't use
that preempt-disable based locking.

Sebastian

Marcelo Tosatti June 16, 2020, 4:55 p.m. UTC | #3

On Wed, Apr 24, 2019 at 05:15:52AM -0700, Matthew Wilcox wrote:
> On Wed, Apr 24, 2019 at 01:12:04PM +0200, Sebastian Andrzej Siewior wrote:
> > The swap code synchronizes its access to the (four) pagevec struct
> > (which is allocated per-CPU) by disabling preemption. This works and the
> > one struct needs to be accessed from interrupt context is protected by
> > disabling interrupts. This was manually audited and there is no lockdep
> > coverage for this.
> > There is one case where the per-CPU of a remote CPU needs to be accessed
> > and this is solved by started a worker on the remote CPU and waiting for
> > it to finish.
> > 
> > In v1 [0] it was attempted to add per-CPU spinlocks for the access to
> > struct. This would add lockdep coverage and access from a remote CPU so
> > the worker wouldn't be required.
> 
> >From my point of view, what is missing from this description is why we
> want to be able to access these structs from a remote CPU.  It's explained
> a little better in the 4/4 changelog, but I don't see any numbers that
> suggest what kinds of gains we might see (eg "reduces power consumption
> by x% on a particular setup", or even "average length of time in idle
> extended from x ms to y ms").

Willy, 

This is to avoid:

1) Interrupting CPUs which can't afford executing a workqueue item.

2) To avoid a system lockup when running a busy-spinning (on network 
RX descriptor) SCHED_FIFO application:

[ 7475.821066] INFO: task ld:274531 blocked for more than 600 seconds.
[ 7475.822157]       Not tainted 4.18.0-208.rt5.20.el8.x86_64 #1
[ 7475.823094] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
[ 7475.824392] ld              D    0 274531 274530 0x00084080
[ 7475.825307] Call Trace:
[ 7475.825761]  __schedule+0x342/0x850
[ 7475.826377]  schedule+0x39/0xd0
[ 7475.826923]  schedule_timeout+0x20e/0x410
[ 7475.827610]  ? __schedule+0x34a/0x850
[ 7475.828247]  ? ___preempt_schedule+0x16/0x18
[ 7475.828953]  wait_for_completion+0x85/0xe0
[ 7475.829653]  flush_work+0x11a/0x1c0
[ 7475.830313]  ? flush_workqueue_prep_pwqs+0x130/0x130
[ 7475.831148]  drain_all_pages+0x140/0x190
[ 7475.831803]  __alloc_pages_slowpath+0x3f8/0xe20
[ 7475.832571]  ? mem_cgroup_commit_charge+0xcb/0x510
[ 7475.833371]  __alloc_pages_nodemask+0x1ca/0x2b0
[ 7475.834134]  pagecache_get_page+0xb5/0x2d0
[ 7475.834814]  ? account_page_dirtied+0x11a/0x220
[ 7475.835579]  grab_cache_page_write_begin+0x1f/0x40
[ 7475.836379]  iomap_write_begin.constprop.44+0x1c1/0x370
[ 7475.837241]  ? iomap_write_end+0x91/0x290
[ 7475.837911]  iomap_write_actor+0x92/0x170