From patchwork Tue Sep 11 00:42:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10594947 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D2973921 for ; Tue, 11 Sep 2018 00:43:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD2FE212DA for ; Tue, 11 Sep 2018 00:43:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AF598237A5; Tue, 11 Sep 2018 00:43:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 706D2212DA for ; Tue, 11 Sep 2018 00:42:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 804BA8E0004; Mon, 10 Sep 2018 20:42:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7D9D28E0001; Mon, 10 Sep 2018 20:42:58 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C72F8E0004; Mon, 10 Sep 2018 20:42:58 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yw1-f72.google.com (mail-yw1-f72.google.com [209.85.161.72]) by kanga.kvack.org (Postfix) with ESMTP id 3ADAA8E0001 for ; Mon, 10 Sep 2018 20:42:58 -0400 (EDT) Received: by mail-yw1-f72.google.com with SMTP id j71-v6so14393738ywb.22 for ; Mon, 10 Sep 2018 17:42:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=AYffcWNGUXpctZYcdeMQTwk8kPozUsCUImBciAV0wPQ=; b=s4wisBvjBaT8IO4J333ZhlGbKjEZXEHqcVsrJyCyWS5//H7srn53o+Ct6nzZVe6XTR XJxuWpWzIgdlJRBxHlI6tvjyYzCblT8FtKnJFRLiaOjZayeNeUPiPlOMKZ5+chwucCVD pTIJ1gW1NkhMxceJw7UvdC3DBOF5meF9nKLnL0+Q+sieGM/dy00gyZpHi/qTe9Ae+Uso Kb55M5y6MZdiMv7XeY0cBH0evbAKX1waabq9ODhknrjRxHIAzSpAxoTgeBQyv8fTGINY qjMqqds1Uu7M0DSjfhgKnlGLqcFnUPnNXP6ooA7P32cDyafhecnLwUiG7kK2bDAHuI3g oPpA== X-Gm-Message-State: APzg51BJZTw/pYyC0wFpAYXLQiNs/HkMH20bI37jR6GmVOiKhRAsQppf YCKBPIwune9/SBoddfQvVZ5v581i26HLUS7ZX199RIvf+6zr37oA0gxJPyulAS59tctEZb+vRQN i7x/AWiyLXr++1nfc8tvd26Xy2P0zmbUpc64Itnl/cHF81lpuLJs55SkvmPE75kYnkw== X-Received: by 2002:a0d:c385:: with SMTP id f127-v6mr11021479ywd.324.1536626577900; Mon, 10 Sep 2018 17:42:57 -0700 (PDT) X-Google-Smtp-Source: ANB0VdalmJIHOxJdaZ2ZJCt+z7hNYJXGTxztwZWEUYnOIBUc0G6+dLW0ypFiAsD6bgSNoWnvEo1h X-Received: by 2002:a0d:c385:: with SMTP id f127-v6mr11021458ywd.324.1536626577013; Mon, 10 Sep 2018 17:42:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536626576; cv=none; d=google.com; s=arc-20160816; b=K3wYGN+Nvi1RoB2El1fbKbdB5Xy1Q5Q7RqcNa7UukMBQ//yv5EgMVHZTYiZU5Yk/Ub MzUkkJz2HsTff/PT0ADgkdhFLnlveACynbJufczl0zF5AztuT8vP+2J349DBDQuelr4m /rW2xgc4KZQO5y0tGvq11gowj0M1aVD32LqqeIDEw9qpDzxp7pC1vhwnJ+6OdYDBTEf9 txWmCqQusCDoQwgV71XznOOkgjZUlPP5KayEg/WBph2f3rAxQAv9ZoMxjJgLNAV/DlqF s6j1hosC4XwvxHtiKIQsp9WN9b+9UCX7DHdmc/YykfLITWi/j8W8RuhGNyupJ2/kcx4J Ll9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AYffcWNGUXpctZYcdeMQTwk8kPozUsCUImBciAV0wPQ=; b=G6jy1sEG28L/LV7QlGP5ICFHsnjZnXLWtcP8qVweUqXMkcUm2sSoQybBlmlGMBjkbB pvAIdaB8L/ydgrp4DSGMVGaZmjJ0Femhy9h0glqT/5+yk5rJWI2VgxRQXlXxQr2sY4v+ GF1Z2Md00twgjSHzFyCv4pIAZogPgW10KdTRW6EIWaBPdnq9qkGNUyvlzA2i+TslYIVk lJ9SGDbUpqlTZSNGvJctS/1QD4xys5hYDvzZlXXZF9SjzcYq6BlX3C9UHQfqVBb56Cg5 9zryNbcwXXxVe9tzKPBSJYpsuK1OeX2SwodRpf7BfEm4OX7piW4GzHPIpuL6enQB8op6 E8XA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=yE9q03bG; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.78 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from aserp2120.oracle.com (aserp2120.oracle.com. [141.146.126.78]) by mx.google.com with ESMTPS id a133-v6si4797416ywb.15.2018.09.10.17.42.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 17:42:56 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.78 as permitted sender) client-ip=141.146.126.78; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=yE9q03bG; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.78 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8B0clcb087637; Tue, 11 Sep 2018 00:42:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=AYffcWNGUXpctZYcdeMQTwk8kPozUsCUImBciAV0wPQ=; b=yE9q03bGg4DiTrzu//XkMz0lWNcWcMHJRdVcfQidV3VrlNKaKs91rjE0E5osBLaYN1QS 9/b5H7DspSjo409QRAhFRZBAm+3yiIEBCLldWsPSvUnAc1khkA8bjZyATQMs2y+RY0w4 Di3rFSVnxWPxbTzqocWWlm3ZPsPzQ1+PiErwTJp3HWKpcBJwSss0FajR+i7jXjm4cKGK D6UthFjwuorYfjBVp24QCf7XKFwj4lGDXlYx7iRdLPjewj8MTpAg5QtxXCFY7xNSZRSy W22Cu/tezk62jWB5yfW55DXrDx/wEFHhXRKUFvaMUY+0pWIBOaRyBXAhwAcM4Sw9tGOt Sw== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2mc6cph1fx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 00:42:48 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w8B0gmR3008962 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 00:42:48 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w8B0gmw1018658; Tue, 11 Sep 2018 00:42:48 GMT Received: from localhost.localdomain (/73.143.71.164) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Sep 2018 17:42:47 -0700 From: Daniel Jordan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Cc: aaron.lu@intel.com, ak@linux.intel.com, akpm@linux-foundation.org, dave.dice@oracle.com, dave.hansen@linux.intel.com, hannes@cmpxchg.org, levyossi@icloud.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mhocko@kernel.org, Pavel.Tatashin@microsoft.com, steven.sistare@oracle.com, tim.c.chen@intel.com, vdavydov.dev@gmail.com, ying.huang@intel.com Subject: [RFC PATCH v2 1/8] mm, memcontrol.c: make memcg lru stats thread-safe without lru_lock Date: Mon, 10 Sep 2018 20:42:33 -0400 Message-Id: <20180911004240.4758-2-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180911004240.4758-1-daniel.m.jordan@oracle.com> References: <20180911004240.4758-1-daniel.m.jordan@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9012 signatures=668708 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809110006 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP lru_lock needs to be held to update memcg LRU statistics. This requirement arises fairly naturally based on when the stats are updated because callers are holding lru_lock already. In preparation for allowing concurrent adds and removes from the LRU, however, make concurrent updates to these statistics safe without lru_lock. The lock continues to be held until later in the series, when it is replaced with a rwlock that also disables preemption, maintaining the assumption of __mod_lru_zone_size, which is introduced here. Follow the existing pattern for statistics in memcontrol.h by using a combination of per-cpu counters and atomics. Remove the negative statistics warning from ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size"). Although an earlier version of this patch updated the warning to account for the error introduced by the per-cpu counters, Hugh says this warning has not been seen in the wild and that for simplicity's sake it should probably just be removed. Signed-off-by: Daniel Jordan --- include/linux/memcontrol.h | 43 +++++++++++++++++++++++++++++--------- mm/memcontrol.c | 29 +++++++------------------ 2 files changed, 40 insertions(+), 32 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d99b71bc2c66..6377dc76dc41 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -99,7 +99,8 @@ struct mem_cgroup_reclaim_iter { }; struct lruvec_stat { - long count[NR_VM_NODE_STAT_ITEMS]; + long node[NR_VM_NODE_STAT_ITEMS]; + long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; }; /* @@ -109,9 +110,8 @@ struct mem_cgroup_per_node { struct lruvec lruvec; struct lruvec_stat __percpu *lruvec_stat_cpu; - atomic_long_t lruvec_stat[NR_VM_NODE_STAT_ITEMS]; - - unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; + atomic_long_t node_stat[NR_VM_NODE_STAT_ITEMS]; + atomic_long_t lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; struct mem_cgroup_reclaim_iter iter[DEF_PRIORITY + 1]; @@ -446,7 +446,7 @@ unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru) mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec); for (zid = 0; zid < MAX_NR_ZONES; zid++) - nr_pages += mz->lru_zone_size[zid][lru]; + nr_pages += atomic64_read(&mz->lru_zone_size[zid][lru]); return nr_pages; } @@ -457,7 +457,7 @@ unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, struct mem_cgroup_per_node *mz; mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - return mz->lru_zone_size[zone_idx][lru]; + return atomic64_read(&mz->lru_zone_size[zone_idx][lru]); } void mem_cgroup_handle_over_high(void); @@ -575,7 +575,7 @@ static inline unsigned long lruvec_page_state(struct lruvec *lruvec, return node_page_state(lruvec_pgdat(lruvec), idx); pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - x = atomic_long_read(&pn->lruvec_stat[idx]); + x = atomic_long_read(&pn->node_stat[idx]); #ifdef CONFIG_SMP if (x < 0) x = 0; @@ -601,12 +601,12 @@ static inline void __mod_lruvec_state(struct lruvec *lruvec, __mod_memcg_state(pn->memcg, idx, val); /* Update lruvec */ - x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]); + x = val + __this_cpu_read(pn->lruvec_stat_cpu->node[idx]); if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { - atomic_long_add(x, &pn->lruvec_stat[idx]); + atomic_long_add(x, &pn->node_stat[idx]); x = 0; } - __this_cpu_write(pn->lruvec_stat_cpu->count[idx], x); + __this_cpu_write(pn->lruvec_stat_cpu->node[idx], x); } static inline void mod_lruvec_state(struct lruvec *lruvec, @@ -619,6 +619,29 @@ static inline void mod_lruvec_state(struct lruvec *lruvec, local_irq_restore(flags); } +/** + * __mod_lru_zone_size - update memcg lru statistics in batches + * + * Updates memcg lru statistics using per-cpu counters that spill into atomics + * above a threshold. + * + * Assumes that the caller has disabled preemption. IRQs may be enabled + * because this function is not called from irq context. + */ +static inline void __mod_lru_zone_size(struct mem_cgroup_per_node *pn, + enum lru_list lru, int zid, int val) +{ + long x; + struct lruvec_stat __percpu *lruvec_stat_cpu = pn->lruvec_stat_cpu; + + x = val + __this_cpu_read(lruvec_stat_cpu->lru_zone_size[zid][lru]); + if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + atomic_long_add(x, &pn->lru_zone_size[zid][lru]); + x = 0; + } + __this_cpu_write(lruvec_stat_cpu->lru_zone_size[zid][lru], x); +} + static inline void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, int val) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2bd3df3d101a..5463ad160e10 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -962,36 +962,20 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd * @zid: zone id of the accounted pages * @nr_pages: positive when adding or negative when removing * - * This function must be called under lru_lock, just before a page is added - * to or just after a page is removed from an lru list (that ordering being - * so as to allow it to check that lru_size 0 is consistent with list_empty). + * This function must be called just before a page is added to, or just after a + * page is removed from, an lru list. Callers aren't required to hold lru_lock + * because these statistics use per-cpu counters and atomics. */ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, int zid, int nr_pages) { struct mem_cgroup_per_node *mz; - unsigned long *lru_size; - long size; if (mem_cgroup_disabled()) return; mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - lru_size = &mz->lru_zone_size[zid][lru]; - - if (nr_pages < 0) - *lru_size += nr_pages; - - size = *lru_size; - if (WARN_ONCE(size < 0, - "%s(%p, %d, %d): lru_size %ld\n", - __func__, lruvec, lru, nr_pages, size)) { - VM_BUG_ON(1); - *lru_size = 0; - } - - if (nr_pages > 0) - *lru_size += nr_pages; + __mod_lru_zone_size(mz, lru, zid, nr_pages); } bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg) @@ -1833,9 +1817,10 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu) struct mem_cgroup_per_node *pn; pn = mem_cgroup_nodeinfo(memcg, nid); - x = this_cpu_xchg(pn->lruvec_stat_cpu->count[i], 0); + x = this_cpu_xchg(pn->lruvec_stat_cpu->node[i], + 0); if (x) - atomic_long_add(x, &pn->lruvec_stat[i]); + atomic_long_add(x, &pn->node_stat[i]); } } From patchwork Tue Sep 11 00:42:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10594951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4023A6CB for ; Tue, 11 Sep 2018 00:43:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2D2A2212DA for ; Tue, 11 Sep 2018 00:43:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2146A237A5; Tue, 11 Sep 2018 00:43:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08901212DA for ; Tue, 11 Sep 2018 00:43:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CAE18E000B; Mon, 10 Sep 2018 20:43:01 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7DB708E0001; Mon, 10 Sep 2018 20:43:01 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67A528E000B; Mon, 10 Sep 2018 20:43:01 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-it0-f71.google.com (mail-it0-f71.google.com [209.85.214.71]) by kanga.kvack.org (Postfix) with ESMTP id 3A39A8E0001 for ; Mon, 10 Sep 2018 20:43:01 -0400 (EDT) Received: by mail-it0-f71.google.com with SMTP id u195-v6so5991421ith.2 for ; Mon, 10 Sep 2018 17:43:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=ThVUU18QngZkA9uieP8cGS3ycZmWPjCw5+ruUTarAVY=; b=ozlrfLUpu/l05Z8O0kWcaDR8bki4xGzWxcNbd6fCyfArg9rmjRlZevAvJKhy8dSIIa LySFaDM7xBC3QBVAGIfhcg3jck5Ro0eiRJv5aYPe+aCO0LgigmNBV7PwT+3/MpRVG2Iz 3NoxEVj4i2vMI7oy7uuU//GhOOKzXOHACpw+r98k3L4geozP60V8kmKaZ98jfpNivyOo QdcjsQ0XlF5YIIdBK/FxkZWarcMQWeNPvymgoPF/EJykRJppDYKyJnoSGfiW/DljuYa8 jmQP1awZcp65Clv0bHyQTLMNGVthcHHj7S6yGRTQgaE4eF5LfxzA05gH077KRgShjl6g L5gw== X-Gm-Message-State: APzg51BK+gV2WcWIvUBw1ZNRZnkN0PGIrSAnLHmMAZcp+9s/7WDW4zeC KL2NQ58tlwrIB2skcLsliLFe2rN/84p2awFKy+nPWwQVQlL4ihO4HRi7ef6D7mx1cJQsJ/BzoLc /xIV5F7TSj0uJO0jGKOXb7cy0RV6TuEd0nHbdNUEsCDhYeYBsDp8YQCW0RR1hZ7oUbQ== X-Received: by 2002:a24:f205:: with SMTP id j5-v6mr20424296ith.50.1536626580976; Mon, 10 Sep 2018 17:43:00 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaDchPgVC563dOok2tZchqFQP98Ttt8a+QYSEhX/+ZIZyjsU1LNkepkpGoepVt7EPsSmpOR X-Received: by 2002:a24:f205:: with SMTP id j5-v6mr20424239ith.50.1536626579299; Mon, 10 Sep 2018 17:42:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536626579; cv=none; d=google.com; s=arc-20160816; b=09EYYGtVbt8A0uyKJP8a5UUecxWyMQEgiJMYZ1lAemIbwHnW3iQwMg2CFV05ybEeqo 3EBqHzTUpmLxNMyIaVkHvhpOZAR7WRkYPR93poJMCt7YF6jbCFQwN4SODT4eSYoh5NaO v5kPrzTzoAgqesTWI31imnRrjqOWIwSWGQjpqIHR4AyXzC5ZKcMo+5k2CwE5WAXWrDbU hVeYZXYDZYVpiFUZL/O99L/AgqiF+qaJ3D1B/7eyNMzwutwj7UFHD7nz8gJyVI7frkgA KRPEyVxQ67Md3FNvMwLiPMsF6Oa3+4hWVuoi7Kf0UOpy47DF1fKQzRCXI+Jot2UH8awg trsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ThVUU18QngZkA9uieP8cGS3ycZmWPjCw5+ruUTarAVY=; b=SZolRNe+yGf8MCM4wR9+naPoeRZnqOgNXNyxzVwCzPnVT8e4dJyfXfWNoWA3gXlAF0 of3Oq4yzG7BNq8xptW7OKB9cGQacsrqyP0H/4pTdsfGblP6cevCz6isznMDnQEXogWd7 0aqT/M4qbXtUwi6lRoEGlyo7U9whwYmxVvnQgZsPAnfNKX6uwRPH4GoqxSty5taxs3Zq 8z4SSGLk1ocMIG82skJSKiLzS0EojrgfOFsM7zkLZOrYMf6l5PZwlaMF7XDzxqbw6scX 0RzV+Ul146QlL+YyKXvqDc/WcnCjJe0Z+ZO4qlz+qy8Op82Q8Uig6K3VsjX1DBGpMnUk otJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="2r/JTtfm"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id 124-v6si11714385itu.102.2018.09.10.17.42.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 17:42:59 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="2r/JTtfm"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8B0coIl064218; Tue, 11 Sep 2018 00:42:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=ThVUU18QngZkA9uieP8cGS3ycZmWPjCw5+ruUTarAVY=; b=2r/JTtfm/HWttdJ0qngrrvTwpK+anuS0eGkGDnGdi8OmbVTrikGsUdPXCDmG/iiCwjaR 3nQ8uScI30UehO74XChn325CiFZGFhNOu17o95RwiC3WDvKCNeb0u5GDk4nLsSQKZQbL qFryqOM/R0xtyvdi8o0H1wRzlg8ACs+2jM8rvJWtwh3SeJqfiBwoFdtSs6gDK+68AIWC HF5nvjLft7g9u5nUO8IFgxUb+EGkd1J/Z0Ih9aG2aBK1DK7GLoQJCWTLo4uFAoLtADmo i6YO6qPaIz+wSc/ewv8Zud2C6zhULYvFvNf8ehsjYJMODht/omjoTvXf7W3PvF2Fovfv wA== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2mc5ut93g9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 00:42:51 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w8B0go9D009045 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 00:42:50 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w8B0go55018670; Tue, 11 Sep 2018 00:42:50 GMT Received: from localhost.localdomain (/73.143.71.164) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Sep 2018 17:42:49 -0700 From: Daniel Jordan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Cc: aaron.lu@intel.com, ak@linux.intel.com, akpm@linux-foundation.org, dave.dice@oracle.com, dave.hansen@linux.intel.com, hannes@cmpxchg.org, levyossi@icloud.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mhocko@kernel.org, Pavel.Tatashin@microsoft.com, steven.sistare@oracle.com, tim.c.chen@intel.com, vdavydov.dev@gmail.com, ying.huang@intel.com Subject: [RFC PATCH v2 2/8] mm: make zone_reclaim_stat updates thread-safe Date: Mon, 10 Sep 2018 20:42:34 -0400 Message-Id: <20180911004240.4758-3-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180911004240.4758-1-daniel.m.jordan@oracle.com> References: <20180911004240.4758-1-daniel.m.jordan@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9012 signatures=668708 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809110006 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP lru_lock needs to be held to update the zone_reclaim_stat statistics. Similar to the previous patch, this requirement again arises fairly naturally because callers are holding lru_lock already. In preparation for allowing concurrent adds and removes from the LRU, however, make concurrent updates to these statistics safe without lru_lock. The lock continues to be held until later in the series, when it is replaced with a rwlock that also disables preemption, maintaining the assumption in the comment above __update_page_reclaim_stat, which is introduced here. Use a combination of per-cpu counters and atomics. Signed-off-by: Daniel Jordan --- include/linux/mmzone.h | 50 ++++++++++++++++++++++++++++++++++++++++++ init/main.c | 1 + mm/memcontrol.c | 20 ++++++++--------- mm/memory_hotplug.c | 1 + mm/mmzone.c | 14 ++++++++++++ mm/swap.c | 14 ++++++++---- mm/vmscan.c | 42 ++++++++++++++++++++--------------- 7 files changed, 110 insertions(+), 32 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 32699b2dc52a..6d4c23a3069d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -229,6 +229,12 @@ struct zone_reclaim_stat { * * The anon LRU stats live in [0], file LRU stats in [1] */ + atomic_long_t recent_rotated[2]; + atomic_long_t recent_scanned[2]; +}; + +/* These spill into the counters in struct zone_reclaim_stat beyond a cutoff. */ +struct zone_reclaim_stat_cpu { unsigned long recent_rotated[2]; unsigned long recent_scanned[2]; }; @@ -236,6 +242,7 @@ struct zone_reclaim_stat { struct lruvec { struct list_head lists[NR_LRU_LISTS]; struct zone_reclaim_stat reclaim_stat; + struct zone_reclaim_stat_cpu __percpu *reclaim_stat_cpu; /* Evictions & activations on the inactive file list */ atomic_long_t inactive_age; /* Refaults at the time of last reclaim cycle */ @@ -245,6 +252,47 @@ struct lruvec { #endif }; +#define RECLAIM_STAT_BATCH 32U /* From SWAP_CLUSTER_MAX */ + +/* + * Callers of the below three functions that update reclaim stats must hold + * lru_lock and have preemption disabled. Use percpu counters that spill into + * atomics to allow concurrent updates when multiple readers hold lru_lock. + */ + +static inline void __update_page_reclaim_stat(unsigned long count, + unsigned long *percpu_stat, + atomic_long_t *stat) +{ + unsigned long val = *percpu_stat + count; + + if (unlikely(val > RECLAIM_STAT_BATCH)) { + atomic_long_add(val, stat); + val = 0; + } + *percpu_stat = val; +} + +static inline void update_reclaim_stat_scanned(struct lruvec *lruvec, int file, + unsigned long count) +{ + struct zone_reclaim_stat_cpu __percpu *percpu_stat = + this_cpu_ptr(lruvec->reclaim_stat_cpu); + + __update_page_reclaim_stat(count, &percpu_stat->recent_scanned[file], + &lruvec->reclaim_stat.recent_scanned[file]); +} + +static inline void update_reclaim_stat_rotated(struct lruvec *lruvec, int file, + unsigned long count) +{ + struct zone_reclaim_stat_cpu __percpu *percpu_stat = + this_cpu_ptr(lruvec->reclaim_stat_cpu); + + __update_page_reclaim_stat(count, &percpu_stat->recent_rotated[file], + &lruvec->reclaim_stat.recent_rotated[file]); +} + /* Mask used at gathering information at once (see memcontrol.c) */ #define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) #define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON)) @@ -795,6 +843,8 @@ extern void init_currently_empty_zone(struct zone *zone, unsigned long start_pfn unsigned long size); extern void lruvec_init(struct lruvec *lruvec); +extern void lruvec_init_late(struct lruvec *lruvec); +extern void lruvecs_init_late(void); static inline struct pglist_data *lruvec_pgdat(struct lruvec *lruvec) { diff --git a/init/main.c b/init/main.c index 3b4ada11ed52..80ad02fe99de 100644 --- a/init/main.c +++ b/init/main.c @@ -526,6 +526,7 @@ static void __init mm_init(void) init_espfix_bsp(); /* Should be run after espfix64 is set up. */ pti_init(); + lruvecs_init_late(); } asmlinkage __visible void __init start_kernel(void) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5463ad160e10..f7f9682482cd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3152,22 +3152,22 @@ static int memcg_stat_show(struct seq_file *m, void *v) pg_data_t *pgdat; struct mem_cgroup_per_node *mz; struct zone_reclaim_stat *rstat; - unsigned long recent_rotated[2] = {0, 0}; - unsigned long recent_scanned[2] = {0, 0}; + unsigned long rota[2] = {0, 0}; + unsigned long scan[2] = {0, 0}; for_each_online_pgdat(pgdat) { mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id); rstat = &mz->lruvec.reclaim_stat; - recent_rotated[0] += rstat->recent_rotated[0]; - recent_rotated[1] += rstat->recent_rotated[1]; - recent_scanned[0] += rstat->recent_scanned[0]; - recent_scanned[1] += rstat->recent_scanned[1]; + rota[0] += atomic_long_read(&rstat->recent_rotated[0]); + rota[1] += atomic_long_read(&rstat->recent_rotated[1]); + scan[0] += atomic_long_read(&rstat->recent_scanned[0]); + scan[1] += atomic_long_read(&rstat->recent_scanned[1]); } - seq_printf(m, "recent_rotated_anon %lu\n", recent_rotated[0]); - seq_printf(m, "recent_rotated_file %lu\n", recent_rotated[1]); - seq_printf(m, "recent_scanned_anon %lu\n", recent_scanned[0]); - seq_printf(m, "recent_scanned_file %lu\n", recent_scanned[1]); + seq_printf(m, "recent_rotated_anon %lu\n", rota[0]); + seq_printf(m, "recent_rotated_file %lu\n", rota[1]); + seq_printf(m, "recent_scanned_anon %lu\n", scan[0]); + seq_printf(m, "recent_scanned_file %lu\n", scan[1]); } #endif diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 25982467800b..d3ebb11c3f9f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1009,6 +1009,7 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) /* init node's zones as empty zones, we don't have any present pages.*/ free_area_init_node(nid, zones_size, start_pfn, zholes_size); pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); + lruvec_init_late(node_lruvec(pgdat)); /* * The node we allocated has no zone fallback lists. For avoiding diff --git a/mm/mmzone.c b/mm/mmzone.c index 4686fdc23bb9..090cd4f7effb 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -9,6 +9,7 @@ #include #include #include +#include struct pglist_data *first_online_pgdat(void) { @@ -96,6 +97,19 @@ void lruvec_init(struct lruvec *lruvec) INIT_LIST_HEAD(&lruvec->lists[lru]); } +void lruvec_init_late(struct lruvec *lruvec) +{ + lruvec->reclaim_stat_cpu = alloc_percpu(struct zone_reclaim_stat_cpu); +} + +void lruvecs_init_late(void) +{ + pg_data_t *pgdat; + + for_each_online_pgdat(pgdat) + lruvec_init_late(node_lruvec(pgdat)); +} + #if defined(CONFIG_NUMA_BALANCING) && !defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) int page_cpupid_xchg_last(struct page *page, int cpupid) { diff --git a/mm/swap.c b/mm/swap.c index 3dd518832096..219c234d632f 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "internal.h" @@ -260,14 +261,19 @@ void rotate_reclaimable_page(struct page *page) } } +/* + * Updates page reclaim statistics using per-cpu counters that spill into + * atomics above a threshold. + * + * Assumes that the caller has disabled preemption. IRQs may be enabled + * because this function is not called from irq context. + */ static void update_page_reclaim_stat(struct lruvec *lruvec, int file, int rotated) { - struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; - - reclaim_stat->recent_scanned[file]++; + update_reclaim_stat_scanned(lruvec, file, 1); if (rotated) - reclaim_stat->recent_rotated[file]++; + update_reclaim_stat_rotated(lruvec, file, 1); } static void __activate_page(struct page *page, struct lruvec *lruvec, diff --git a/mm/vmscan.c b/mm/vmscan.c index 9270a4370d54..730b6d0c6c61 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1655,7 +1655,6 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, static noinline_for_stack void putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list) { - struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; struct pglist_data *pgdat = lruvec_pgdat(lruvec); LIST_HEAD(pages_to_free); @@ -1684,7 +1683,7 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list) if (is_active_lru(lru)) { int file = is_file_lru(lru); int numpages = hpage_nr_pages(page); - reclaim_stat->recent_rotated[file] += numpages; + update_reclaim_stat_rotated(lruvec, file, numpages); } if (put_page_testzero(page)) { __ClearPageLRU(page); @@ -1736,7 +1735,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, isolate_mode_t isolate_mode = 0; int file = is_file_lru(lru); struct pglist_data *pgdat = lruvec_pgdat(lruvec); - struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; bool stalled = false; while (unlikely(too_many_isolated(pgdat, file, sc))) { @@ -1763,7 +1761,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, &nr_scanned, sc, isolate_mode, lru); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); - reclaim_stat->recent_scanned[file] += nr_taken; + update_reclaim_stat_scanned(lruvec, file, nr_taken); if (current_is_kswapd()) { if (global_reclaim(sc)) @@ -1914,7 +1912,6 @@ static void shrink_active_list(unsigned long nr_to_scan, LIST_HEAD(l_active); LIST_HEAD(l_inactive); struct page *page; - struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; unsigned nr_deactivate, nr_activate; unsigned nr_rotated = 0; isolate_mode_t isolate_mode = 0; @@ -1932,7 +1929,7 @@ static void shrink_active_list(unsigned long nr_to_scan, &nr_scanned, sc, isolate_mode, lru); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); - reclaim_stat->recent_scanned[file] += nr_taken; + update_reclaim_stat_scanned(lruvec, file, nr_taken); __count_vm_events(PGREFILL, nr_scanned); count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); @@ -1989,7 +1986,7 @@ static void shrink_active_list(unsigned long nr_to_scan, * helps balance scan pressure between file and anonymous pages in * get_scan_count. */ - reclaim_stat->recent_rotated[file] += nr_rotated; + update_reclaim_stat_rotated(lruvec, file, nr_rotated); nr_activate = move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru); nr_deactivate = move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE); @@ -2116,7 +2113,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, unsigned long *lru_pages) { int swappiness = mem_cgroup_swappiness(memcg); - struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; + struct zone_reclaim_stat *rstat = &lruvec->reclaim_stat; u64 fraction[2]; u64 denominator = 0; /* gcc */ struct pglist_data *pgdat = lruvec_pgdat(lruvec); @@ -2125,6 +2122,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, unsigned long anon, file; unsigned long ap, fp; enum lru_list lru; + long recent_scanned[2], recent_rotated[2]; /* If we have no swap space, do not bother scanning anon pages. */ if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) { @@ -2238,14 +2236,22 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES); spin_lock_irq(&pgdat->lru_lock); - if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) { - reclaim_stat->recent_scanned[0] /= 2; - reclaim_stat->recent_rotated[0] /= 2; + recent_scanned[0] = atomic_long_read(&rstat->recent_scanned[0]); + recent_rotated[0] = atomic_long_read(&rstat->recent_rotated[0]); + if (unlikely(recent_scanned[0] > anon / 4)) { + recent_scanned[0] /= 2; + recent_rotated[0] /= 2; + atomic_long_set(&rstat->recent_scanned[0], recent_scanned[0]); + atomic_long_set(&rstat->recent_rotated[0], recent_rotated[0]); } - if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) { - reclaim_stat->recent_scanned[1] /= 2; - reclaim_stat->recent_rotated[1] /= 2; + recent_scanned[1] = atomic_long_read(&rstat->recent_scanned[1]); + recent_rotated[1] = atomic_long_read(&rstat->recent_rotated[1]); + if (unlikely(recent_scanned[1] > file / 4)) { + recent_scanned[1] /= 2; + recent_rotated[1] /= 2; + atomic_long_set(&rstat->recent_scanned[1], recent_scanned[1]); + atomic_long_set(&rstat->recent_rotated[1], recent_rotated[1]); } /* @@ -2253,11 +2259,11 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, * proportional to the fraction of recently scanned pages on * each list that were recently referenced and in active use. */ - ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1); - ap /= reclaim_stat->recent_rotated[0] + 1; + ap = anon_prio * (recent_scanned[0] + 1); + ap /= recent_rotated[0] + 1; - fp = file_prio * (reclaim_stat->recent_scanned[1] + 1); - fp /= reclaim_stat->recent_rotated[1] + 1; + fp = file_prio * (recent_scanned[1] + 1); + fp /= recent_rotated[1] + 1; spin_unlock_irq(&pgdat->lru_lock); fraction[0] = ap; From patchwork Tue Sep 11 00:42:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10594953 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 06EA46CB for ; Tue, 11 Sep 2018 00:43:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6BFE212DA for ; Tue, 11 Sep 2018 00:43:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D8A6D237A5; Tue, 11 Sep 2018 00:43:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A4AD212DA for ; Tue, 11 Sep 2018 00:43:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 23ABE8E000C; Mon, 10 Sep 2018 20:43:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1BFDE8E0001; Mon, 10 Sep 2018 20:43:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 039C78E000C; Mon, 10 Sep 2018 20:43:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id C4A3A8E0001 for ; Mon, 10 Sep 2018 20:43:09 -0400 (EDT) Received: by mail-it0-f72.google.com with SMTP id x15-v6so44613551ite.8 for ; Mon, 10 Sep 2018 17:43:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=4Ed7kzcG2Xz+GZckontJ5kX27xOPSUtY6Bz6BF7+wsI=; b=Ihe0akqE+HBCblbcvEfgMp1wRYgiCnBXNKqfQNOliJRWrwn6SRNAogdOOTm7UbJl99 eUKvUgCvjsL0PzWQbVzrrH2tOU0G2hscHLc4eJkZnBIUfc18MtZ1uOdx7iJnb2WvK6ao dRPw+4LHnC4DNgZXmkiz4znmq2j4SizzTgZzlPvSF6Y5efapGd9JzoFm/AUbJHyNcE7X DMFC4+/ShCdfzAQBZzrBkiFLmA8QahekhdJAzOyZ+7PGDtb25KdVbBsxBAc85JUmTJHd o1kke6yzanxA5ijJ/n4ktOc6EcMQRgPR6wfzTQvtyEQvBZ0XITfvedtRl84VcFtPcbYf UF5A== X-Gm-Message-State: APzg51C3hx5yqLtaJ6GqBGhU5/wV8XGF2nHsvnj24PbfWByo9/U0dWxl YD8pbbvqhEVH21MtupU4lfx/EcD6juJ6ao1j+e5u8QYDjbbT8HLEJK8FDE6s9CkqafqyrZ2wirx u0cZxAgzQxtDV9OYSJnjCYJnPRpn7425fOSGImzNItRbI/g6lICZxD8pogo89Vje75A== X-Received: by 2002:a6b:ee13:: with SMTP id i19-v6mr19282243ioh.132.1536626589448; Mon, 10 Sep 2018 17:43:09 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbX4tQj4vgaBA/bfmnOgzIqpua0UhmSRQlAjCmNduvL8ZULXvv6wMjXIeO55vMk8tEqZJgl X-Received: by 2002:a6b:ee13:: with SMTP id i19-v6mr19282209ioh.132.1536626587897; Mon, 10 Sep 2018 17:43:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536626587; cv=none; d=google.com; s=arc-20160816; b=GCEa9clU6kXJdfkLUMPuPoXlA5vO/2GmV2U+NvQDVejREFJ87pKt0ST2nEwQt1Va9U xhzhh/yGSNUMyHJP2tQxJ4VIWNL4db/NN/25RDJwwAyLGYNuVqzawu9LfLWdKrcg1AqY YRxmElzHEzfucv1cSDZW3MlgBcYywG+as2/xiDw+riP5i0CHxBK0eWKU6bSr+tQq33Wp AKfCYQF9sAdSPDPv9o/8RvCi/eA5AP4i5svYH68Um7JjFD89udWY3Y6gY5SX3G+HX2X9 cj/YxjyABU4fWIah4bto/XCOZ+I+EfZoQsglwAE4x+GIwUq9nX69vxyr6QcQeW+qKUIe VLdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=4Ed7kzcG2Xz+GZckontJ5kX27xOPSUtY6Bz6BF7+wsI=; b=NgcNgDKfDEY2Iui5GzhYdtsB8LICgbDnJM+PCdhzUcRUyXGHxU5fRqec9yuHNAkwMj sBuryteXNaUFde4zDaC9Q5KsHS1dBwAo7roxTJq9N8QIakFHKnT7WdfzXDzEd3RnexZz 5rBv0fj5lWGwLAi6DLTHidUzOEkIkMd/Q9XxSp8HnwQwCPWM7onkWHiRmRhKFCKmGHAr /9dpl9v33dj9Bg/l8Hbo0HaCWevfExT31419e1p7uznJ/Zb/3LZp5JlWgamOMYo0pd9B O9Aex3J1R7/Oz3c6k9OwIXEuY+U27a6ur87SyL8eCK1H7/tsOto4Xh+BUmNl+RCYUitX IT3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=JfKSAtRQ; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.78 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from aserp2120.oracle.com (aserp2120.oracle.com. [141.146.126.78]) by mx.google.com with ESMTPS id y13-v6si11859069itb.31.2018.09.10.17.43.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 17:43:07 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.78 as permitted sender) client-ip=141.146.126.78; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=JfKSAtRQ; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.78 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8B0cicB087628; Tue, 11 Sep 2018 00:42:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=4Ed7kzcG2Xz+GZckontJ5kX27xOPSUtY6Bz6BF7+wsI=; b=JfKSAtRQK8MqrYdYRKp80G/pt2f5UTqJEshNQ0cmurQf2Nzvl7fIZ2zx9BHrYjHrZrxk OBtf+numXZ2XqVtfjv4Q00PrzTZ9N80LO3FA9riWlWGtY+mqQ4vx2PEtHL5Er1H0Yq2r mesADOvxayLxlPTFSUXkJPTuyRgOqbSif3+YHP7BzJc/krO6rJW5dDdPamxL1cHVwJB4 ecgFQT4nXuWBqbQlqo/0OZXJ48gJc3zOpfzsBwWHyDcgszagFx6JuHPnQa1Ss558t3yT dC4HOUIfRUo13kwZ5sgov9wtXAzWmOu00+EjWUN5rfqP7r+NPasM0R7xwGDiENVgWf0J uw== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2120.oracle.com with ESMTP id 2mc6cph1gf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 00:42:58 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w8B0grfs029572 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 00:42:53 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w8B0gqVZ013739; Tue, 11 Sep 2018 00:42:52 GMT Received: from localhost.localdomain (/73.143.71.164) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Sep 2018 17:42:51 -0700 From: Daniel Jordan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Cc: aaron.lu@intel.com, ak@linux.intel.com, akpm@linux-foundation.org, dave.dice@oracle.com, dave.hansen@linux.intel.com, hannes@cmpxchg.org, levyossi@icloud.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mhocko@kernel.org, Pavel.Tatashin@microsoft.com, steven.sistare@oracle.com, tim.c.chen@intel.com, vdavydov.dev@gmail.com, ying.huang@intel.com Subject: [RFC PATCH v2 3/8] mm: convert lru_lock from a spinlock_t to a rwlock_t Date: Mon, 10 Sep 2018 20:42:35 -0400 Message-Id: <20180911004240.4758-4-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180911004240.4758-1-daniel.m.jordan@oracle.com> References: <20180911004240.4758-1-daniel.m.jordan@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9012 signatures=668708 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809110006 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP lru_lock is currently a spinlock, which allows only one task at a time to add or remove pages from any of a node's LRU lists, even if the pages are in different parts of the same LRU or on different LRUs altogether. This bottleneck shows up in memory-intensive database workloads such as decision support and data warehousing. In the artificial benchmark will-it-scale/page_fault1, the lock contributes to system anti-scaling, so that adding more processes causes less work to be done. To prepare for better lru_lock scalability, change lru_lock into a rwlock_t. For now, just make all users take the lock as writers. Later, to allow concurrent operations, change some users to acquire as readers, which will synchronize amongst themselves in a fine-grained, per-page way. This is explained more later. RW locks are slower than spinlocks. However, our results show that low task counts do not significantly regress, even in the stress test page_fault1, and high task counts enjoy much better scalability. zone->lock is often taken around the same times as lru_lock and contributes to this bottleneck. For the full performance benefits of this work to be realized, both locks must be fixed, but changing lru_lock in isolation still allows modest performance improvements and is one step toward fixing the larger problem. Remove the spin_is_locked check in lru_add_page_tail. Unfortunately, rwlock_t lacks an equivalent and adding one would require 17 new arch_write_is_locked functions, a heavy price for a single debugging check. Yosef Lev had the idea to use a reader-writer lock to split up the code that lru_lock protects, a form of room synchronization. Suggested-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/mmzone.h | 4 +- mm/compaction.c | 99 ++++++++++++++++++++++-------------------- mm/huge_memory.c | 6 +-- mm/memcontrol.c | 4 +- mm/mlock.c | 10 ++--- mm/page_alloc.c | 2 +- mm/page_idle.c | 4 +- mm/swap.c | 44 +++++++++++-------- mm/vmscan.c | 42 +++++++++--------- 9 files changed, 112 insertions(+), 103 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6d4c23a3069d..c140aa9290a8 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -742,7 +742,7 @@ typedef struct pglist_data { /* Write-intensive fields used by page reclaim */ ZONE_PADDING(_pad1_) - spinlock_t lru_lock; + rwlock_t lru_lock; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* @@ -783,7 +783,7 @@ typedef struct pglist_data { #define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn) #define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid)) -static inline spinlock_t *zone_lru_lock(struct zone *zone) +static inline rwlock_t *zone_lru_lock(struct zone *zone) { return &zone->zone_pgdat->lru_lock; } diff --git a/mm/compaction.c b/mm/compaction.c index 29bd1df18b98..1d3c3f872a19 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -347,20 +347,20 @@ static inline void update_pageblock_skip(struct compact_control *cc, * Returns true if the lock is held * Returns false if the lock is not held and compaction should abort */ -static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, - struct compact_control *cc) -{ - if (cc->mode == MIGRATE_ASYNC) { - if (!spin_trylock_irqsave(lock, *flags)) { - cc->contended = true; - return false; - } - } else { - spin_lock_irqsave(lock, *flags); - } - - return true; -} +#define compact_trylock(lock, flags, cc, lockf, trylockf) \ +({ \ + bool __ret = true; \ + if ((cc)->mode == MIGRATE_ASYNC) { \ + if (!trylockf((lock), *(flags))) { \ + (cc)->contended = true; \ + __ret = false; \ + } \ + } else { \ + lockf((lock), *(flags)); \ + } \ + \ + __ret; \ +}) /* * Compaction requires the taking of some coarse locks that are potentially @@ -377,29 +377,29 @@ static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, * Returns false when compaction can continue (sync compaction might have * scheduled) */ -static bool compact_unlock_should_abort(spinlock_t *lock, - unsigned long flags, bool *locked, struct compact_control *cc) -{ - if (*locked) { - spin_unlock_irqrestore(lock, flags); - *locked = false; - } - - if (fatal_signal_pending(current)) { - cc->contended = true; - return true; - } - - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) { - cc->contended = true; - return true; - } - cond_resched(); - } - - return false; -} +#define compact_unlock_should_abort(lock, flags, locked, cc, unlockf) \ +({ \ + bool __ret = false; \ + \ + if (*(locked)) { \ + unlockf((lock), (flags)); \ + *(locked) = false; \ + } \ + \ + if (fatal_signal_pending(current)) { \ + (cc)->contended = true; \ + __ret = true; \ + } else if (need_resched()) { \ + if ((cc)->mode == MIGRATE_ASYNC) { \ + (cc)->contended = true; \ + __ret = true; \ + } else { \ + cond_resched(); \ + } \ + } \ + \ + __ret; \ +}) /* * Aside from avoiding lock contention, compaction also periodically checks @@ -457,7 +457,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, */ if (!(blockpfn % SWAP_CLUSTER_MAX) && compact_unlock_should_abort(&cc->zone->lock, flags, - &locked, cc)) + &locked, cc, spin_unlock_irqrestore)) break; nr_scanned++; @@ -502,8 +502,9 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, * spin on the lock and we acquire the lock as late as * possible. */ - locked = compact_trylock_irqsave(&cc->zone->lock, - &flags, cc); + locked = compact_trylock(&cc->zone->lock, &flags, cc, + spin_lock_irqsave, + spin_trylock_irqsave); if (!locked) break; @@ -757,8 +758,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, * if contended. */ if (!(low_pfn % SWAP_CLUSTER_MAX) - && compact_unlock_should_abort(zone_lru_lock(zone), flags, - &locked, cc)) + && compact_unlock_should_abort(zone_lru_lock(zone), + flags, &locked, cc, write_unlock_irqrestore)) break; if (!pfn_valid_within(low_pfn)) @@ -817,8 +818,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (unlikely(__PageMovable(page)) && !PageIsolated(page)) { if (locked) { - spin_unlock_irqrestore(zone_lru_lock(zone), - flags); + write_unlock_irqrestore( + zone_lru_lock(zone), flags); locked = false; } @@ -847,8 +848,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* If we already hold the lock, we can skip some rechecking */ if (!locked) { - locked = compact_trylock_irqsave(zone_lru_lock(zone), - &flags, cc); + locked = compact_trylock(zone_lru_lock(zone), &flags, + cc, write_lock_irqsave, + write_trylock_irqsave); if (!locked) break; @@ -912,7 +914,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, */ if (nr_isolated) { if (locked) { - spin_unlock_irqrestore(zone_lru_lock(zone), flags); + write_unlock_irqrestore(zone_lru_lock(zone), + flags); locked = false; } putback_movable_pages(&cc->migratepages); @@ -939,7 +942,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, low_pfn = end_pfn; if (locked) - spin_unlock_irqrestore(zone_lru_lock(zone), flags); + write_unlock_irqrestore(zone_lru_lock(zone), flags); /* * Update the pageblock-skip information and cached scanner pfn, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b9f3dbd885bd..6ad045df967d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2453,7 +2453,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, xa_unlock(&head->mapping->i_pages); } - spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); + write_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); unfreeze_page(head); @@ -2653,7 +2653,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) lru_add_drain(); /* prevent PageLRU to go away from under us, and freeze lru stats */ - spin_lock_irqsave(zone_lru_lock(page_zone(head)), flags); + write_lock_irqsave(zone_lru_lock(page_zone(head)), flags); if (mapping) { void **pslot; @@ -2701,7 +2701,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) spin_unlock(&pgdata->split_queue_lock); fail: if (mapping) xa_unlock(&mapping->i_pages); - spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); + write_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); unfreeze_page(head); ret = -EBUSY; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f7f9682482cd..0580aff3bd98 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2043,7 +2043,7 @@ static void lock_page_lru(struct page *page, int *isolated) { struct zone *zone = page_zone(page); - spin_lock_irq(zone_lru_lock(zone)); + write_lock_irq(zone_lru_lock(zone)); if (PageLRU(page)) { struct lruvec *lruvec; @@ -2067,7 +2067,7 @@ static void unlock_page_lru(struct page *page, int isolated) SetPageLRU(page); add_page_to_lru_list(page, lruvec, page_lru(page)); } - spin_unlock_irq(zone_lru_lock(zone)); + write_unlock_irq(zone_lru_lock(zone)); } static void commit_charge(struct page *page, struct mem_cgroup *memcg, diff --git a/mm/mlock.c b/mm/mlock.c index 74e5a6547c3d..f3c628e0eeb0 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -194,7 +194,7 @@ unsigned int munlock_vma_page(struct page *page) * might otherwise copy PageMlocked to part of the tail pages before * we clear it in the head page. It also stabilizes hpage_nr_pages(). */ - spin_lock_irq(zone_lru_lock(zone)); + write_lock_irq(zone_lru_lock(zone)); if (!TestClearPageMlocked(page)) { /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ @@ -206,14 +206,14 @@ unsigned int munlock_vma_page(struct page *page) __mod_zone_page_state(zone, NR_MLOCK, -nr_pages); if (__munlock_isolate_lru_page(page, true)) { - spin_unlock_irq(zone_lru_lock(zone)); + write_unlock_irq(zone_lru_lock(zone)); __munlock_isolated_page(page); goto out; } __munlock_isolation_failed(page); unlock_out: - spin_unlock_irq(zone_lru_lock(zone)); + write_unlock_irq(zone_lru_lock(zone)); out: return nr_pages - 1; @@ -298,7 +298,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) pagevec_init(&pvec_putback); /* Phase 1: page isolation */ - spin_lock_irq(zone_lru_lock(zone)); + write_lock_irq(zone_lru_lock(zone)); for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; @@ -325,7 +325,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) pvec->pages[i] = NULL; } __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); - spin_unlock_irq(zone_lru_lock(zone)); + write_unlock_irq(zone_lru_lock(zone)); /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 22320ea27489..ca6620042431 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6222,7 +6222,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) init_waitqueue_head(&pgdat->kcompactd_wait); #endif pgdat_page_ext_init(pgdat); - spin_lock_init(&pgdat->lru_lock); + rwlock_init(&pgdat->lru_lock); lruvec_init(node_lruvec(pgdat)); pgdat->per_cpu_nodestats = &boot_nodestats; diff --git a/mm/page_idle.c b/mm/page_idle.c index e412a63b2b74..60118aa1b1ef 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -42,12 +42,12 @@ static struct page *page_idle_get_page(unsigned long pfn) return NULL; zone = page_zone(page); - spin_lock_irq(zone_lru_lock(zone)); + write_lock_irq(zone_lru_lock(zone)); if (unlikely(!PageLRU(page))) { put_page(page); page = NULL; } - spin_unlock_irq(zone_lru_lock(zone)); + write_unlock_irq(zone_lru_lock(zone)); return page; } diff --git a/mm/swap.c b/mm/swap.c index 219c234d632f..a16ba5194e1c 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -63,12 +63,12 @@ static void __page_cache_release(struct page *page) struct lruvec *lruvec; unsigned long flags; - spin_lock_irqsave(zone_lru_lock(zone), flags); + write_lock_irqsave(zone_lru_lock(zone), flags); lruvec = mem_cgroup_page_lruvec(page, zone->zone_pgdat); VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); - spin_unlock_irqrestore(zone_lru_lock(zone), flags); + write_unlock_irqrestore(zone_lru_lock(zone), flags); } __ClearPageWaiters(page); mem_cgroup_uncharge(page); @@ -200,17 +200,19 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, struct pglist_data *pagepgdat = page_pgdat(page); if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (pgdat) { + write_unlock_irqrestore(&pgdat->lru_lock, + flags); + } pgdat = pagepgdat; - spin_lock_irqsave(&pgdat->lru_lock, flags); + write_lock_irqsave(&pgdat->lru_lock, flags); } lruvec = mem_cgroup_page_lruvec(page, pgdat); (*move_fn)(page, lruvec, arg); } if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + write_unlock_irqrestore(&pgdat->lru_lock, flags); release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); } @@ -336,9 +338,9 @@ void activate_page(struct page *page) struct zone *zone = page_zone(page); page = compound_head(page); - spin_lock_irq(zone_lru_lock(zone)); + write_lock_irq(zone_lru_lock(zone)); __activate_page(page, mem_cgroup_page_lruvec(page, zone->zone_pgdat), NULL); - spin_unlock_irq(zone_lru_lock(zone)); + write_unlock_irq(zone_lru_lock(zone)); } #endif @@ -735,7 +737,8 @@ void release_pages(struct page **pages, int nr) * same pgdat. The lock is held only if pgdat != NULL. */ if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); + write_unlock_irqrestore(&locked_pgdat->lru_lock, + flags); locked_pgdat = NULL; } @@ -745,8 +748,9 @@ void release_pages(struct page **pages, int nr) /* Device public page can not be huge page */ if (is_device_public_page(page)) { if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); + write_unlock_irqrestore( + &locked_pgdat->lru_lock, + flags); locked_pgdat = NULL; } put_zone_device_private_or_public_page(page); @@ -759,7 +763,9 @@ void release_pages(struct page **pages, int nr) if (PageCompound(page)) { if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); + write_unlock_irqrestore( + &locked_pgdat->lru_lock, + flags); locked_pgdat = NULL; } __put_compound_page(page); @@ -770,12 +776,14 @@ void release_pages(struct page **pages, int nr) struct pglist_data *pgdat = page_pgdat(page); if (pgdat != locked_pgdat) { - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); + if (locked_pgdat) { + write_unlock_irqrestore( + &locked_pgdat->lru_lock, flags); + } lock_batch = 0; locked_pgdat = pgdat; - spin_lock_irqsave(&locked_pgdat->lru_lock, flags); + write_lock_irqsave(&locked_pgdat->lru_lock, + flags); } lruvec = mem_cgroup_page_lruvec(page, locked_pgdat); @@ -791,7 +799,7 @@ void release_pages(struct page **pages, int nr) list_add(&page->lru, &pages_to_free); } if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); + write_unlock_irqrestore(&locked_pgdat->lru_lock, flags); mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); @@ -829,8 +837,6 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, VM_BUG_ON_PAGE(!PageHead(page), page); VM_BUG_ON_PAGE(PageCompound(page_tail), page); VM_BUG_ON_PAGE(PageLRU(page_tail), page); - VM_BUG_ON(NR_CPUS != 1 && - !spin_is_locked(&lruvec_pgdat(lruvec)->lru_lock)); if (!list) SetPageLRU(page_tail); diff --git a/mm/vmscan.c b/mm/vmscan.c index 730b6d0c6c61..e6f8f05d1bc6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1601,7 +1601,7 @@ int isolate_lru_page(struct page *page) struct zone *zone = page_zone(page); struct lruvec *lruvec; - spin_lock_irq(zone_lru_lock(zone)); + write_lock_irq(zone_lru_lock(zone)); lruvec = mem_cgroup_page_lruvec(page, zone->zone_pgdat); if (PageLRU(page)) { int lru = page_lru(page); @@ -1610,7 +1610,7 @@ int isolate_lru_page(struct page *page) del_page_from_lru_list(page, lruvec, lru); ret = 0; } - spin_unlock_irq(zone_lru_lock(zone)); + write_unlock_irq(zone_lru_lock(zone)); } return ret; } @@ -1668,9 +1668,9 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list) VM_BUG_ON_PAGE(PageLRU(page), page); list_del(&page->lru); if (unlikely(!page_evictable(page))) { - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); putback_lru_page(page); - spin_lock_irq(&pgdat->lru_lock); + write_lock_irq(&pgdat->lru_lock); continue; } @@ -1691,10 +1691,10 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list) del_page_from_lru_list(page, lruvec, lru); if (unlikely(PageCompound(page))) { - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge(page); (*get_compound_page_dtor(page))(page); - spin_lock_irq(&pgdat->lru_lock); + write_lock_irq(&pgdat->lru_lock); } else list_add(&page->lru, &pages_to_free); } @@ -1755,7 +1755,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, if (!sc->may_unmap) isolate_mode |= ISOLATE_UNMAPPED; - spin_lock_irq(&pgdat->lru_lock); + write_lock_irq(&pgdat->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, isolate_mode, lru); @@ -1774,7 +1774,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, count_memcg_events(lruvec_memcg(lruvec), PGSCAN_DIRECT, nr_scanned); } - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); if (nr_taken == 0) return 0; @@ -1782,7 +1782,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0, &stat, false); - spin_lock_irq(&pgdat->lru_lock); + write_lock_irq(&pgdat->lru_lock); if (current_is_kswapd()) { if (global_reclaim(sc)) @@ -1800,7 +1800,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&page_list); free_unref_page_list(&page_list); @@ -1880,10 +1880,10 @@ static unsigned move_active_pages_to_lru(struct lruvec *lruvec, del_page_from_lru_list(page, lruvec, lru); if (unlikely(PageCompound(page))) { - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge(page); (*get_compound_page_dtor(page))(page); - spin_lock_irq(&pgdat->lru_lock); + write_lock_irq(&pgdat->lru_lock); } else list_add(&page->lru, pages_to_free); } else { @@ -1923,7 +1923,7 @@ static void shrink_active_list(unsigned long nr_to_scan, if (!sc->may_unmap) isolate_mode |= ISOLATE_UNMAPPED; - spin_lock_irq(&pgdat->lru_lock); + write_lock_irq(&pgdat->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, isolate_mode, lru); @@ -1934,7 +1934,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGREFILL, nr_scanned); count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); while (!list_empty(&l_hold)) { cond_resched(); @@ -1979,7 +1979,7 @@ static void shrink_active_list(unsigned long nr_to_scan, /* * Move pages back to the lru list. */ - spin_lock_irq(&pgdat->lru_lock); + write_lock_irq(&pgdat->lru_lock); /* * Count referenced pages from currently used mappings as rotated, * even though only some of them are actually re-activated. This @@ -1991,7 +1991,7 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_activate = move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru); nr_deactivate = move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&l_hold); free_unref_page_list(&l_hold); @@ -2235,7 +2235,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) + lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES); - spin_lock_irq(&pgdat->lru_lock); + write_lock_irq(&pgdat->lru_lock); recent_scanned[0] = atomic_long_read(&rstat->recent_scanned[0]); recent_rotated[0] = atomic_long_read(&rstat->recent_rotated[0]); if (unlikely(recent_scanned[0] > anon / 4)) { @@ -2264,7 +2264,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, fp = file_prio * (recent_scanned[1] + 1); fp /= recent_rotated[1] + 1; - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); fraction[0] = ap; fraction[1] = fp; @@ -3998,9 +3998,9 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages) pgscanned++; if (pagepgdat != pgdat) { if (pgdat) - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); pgdat = pagepgdat; - spin_lock_irq(&pgdat->lru_lock); + write_lock_irq(&pgdat->lru_lock); } lruvec = mem_cgroup_page_lruvec(page, pgdat); @@ -4021,7 +4021,7 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages) if (pgdat) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); - spin_unlock_irq(&pgdat->lru_lock); + write_unlock_irq(&pgdat->lru_lock); } } #endif /* CONFIG_SHMEM */ From patchwork Tue Sep 11 00:59:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10594955 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A21D14E5 for ; Tue, 11 Sep 2018 01:00:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7CAE729010 for ; Tue, 11 Sep 2018 01:00:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6FB2A2901E; Tue, 11 Sep 2018 01:00:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7669929010 for ; Tue, 11 Sep 2018 01:00:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0064B8E0005; Mon, 10 Sep 2018 21:00:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F1D6C8E0001; Mon, 10 Sep 2018 21:00:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E34F78E0005; Mon, 10 Sep 2018 21:00:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id BB4958E0001 for ; Mon, 10 Sep 2018 21:00:07 -0400 (EDT) Received: by mail-io0-f200.google.com with SMTP id l6-v6so2375792iog.4 for ; Mon, 10 Sep 2018 18:00:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=kE05DxU6fEOt0NEdrU+JKiVWLdV7pbXIFRnkj/LNuq8=; b=OJXXOLW06/3RpzENe/YBrYgrjAnm/giiJBjbcjHyO+tmGNZr8PrsrSQ+CzovlPCS6+ Q/zh93ldc/CtTomYTWAhuTdI2D1vAeTfgbJgNhFdr6NnUcHXEXgbr73yChCJJwrSe4gd 3Fe2Ug0E+p1msMq8nRmJjjqLLZfZHeqJS4mOyHUin7jFsX+Ir/zwBnix8G3fdGND2e+4 DJxINUMHO5iHObR0gTL1lyBASbxzGOlN4cdRTRu/Ms3AhKWNWmtI9foNxn43iOh9KmB1 7bIg0HoQpM6uvZUWxu7K7IfprpW+4Bvx4WCdYr7Miqht/9uaDqbm0EQbkGnKet7qmz7J 9OxA== X-Gm-Message-State: APzg51Dpoy1jba3/CAD9ntr87pT7la10QhbfhsMkkeEV544QNf4yjhKR 7K8ax2WFJ0kC564xWjioSL+C8526Kui8BBQNUhqu+N1xVEu1aaAd9+efVFHMHBPXaxGqg4uR15t ONjE4EGkaxEomxQEWNnhFz4soXuInrMtX1gkmDzZ7B8L5Xp2ENSC4Wm1ittcg6fayXg== X-Received: by 2002:a6b:ec15:: with SMTP id c21-v6mr20729671ioh.33.1536627607490; Mon, 10 Sep 2018 18:00:07 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZmYVSWwCY1DzW42zh+Zogl7skKznExOpvZ3/KyZdaz+S36d7W5hNPpZBbiD7jqJQyvXc7z X-Received: by 2002:a6b:ec15:: with SMTP id c21-v6mr20729596ioh.33.1536627606164; Mon, 10 Sep 2018 18:00:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536627606; cv=none; d=google.com; s=arc-20160816; b=BRMq1YZ9BB0OIvTpOBeGnHQ+sU6AJ/zmgAYonEQk7n4+d35RxV4N7J+paudd2qrvUP ngVSZHwqfnVPZ2VjKKe8BL6GSkWykrCQyn2staoVwaTRon/LkR0QhHgyxxV55x8lRN5x C3Io6XftIgO+loSs4h9kAW9eqmHbSIgwRBdNUwa9Aslu+fpbIptCbr4pIUJuNyBHmEFe Yb6z13YEfeZsVyDSXdnSscS5qfb72TZTSEoYcFMdMU8CLEFO/D6LSQOeYnHtYH5qetzQ hmwkLr/MmnlvmPHrpAOT5+aPFAjcFU2Q0+oCrceG5vi1AD+6Wrn8xIljcIibZs0r3hoH wB6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=kE05DxU6fEOt0NEdrU+JKiVWLdV7pbXIFRnkj/LNuq8=; b=hrFbvjxCvLs7sVAnzRQyl/I7R4185iro1sZ3bOOAW14HkcvetTtfrgRgRhFlAgrMHn JCUl+kCLRTAgExyTJiRwoKJNKOFJNnCwmqHLTqn9WDqWWVFDI0ATmNsqnKYAKbBf49W+ Yu5t91qlgvbIVnR+TiJySpWGq+2TcvD0onbjXuhzLXm9jOExIWS57RdnVeCW/2Y69emx I+YpCnwkc4jDGcHvB4a/BpL6sfMFzFDgpsXfY9zUpJsbiGLWn0mYDwUQ5JQWyyOcSnJ1 46S0v7l2Sm4PRhdMi8CCSbiimXApWloTXDVHvIGauXVmb8wA3PzMV9CXAUCCddldd1gK V5/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=1lcxVSZj; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.85 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2120.oracle.com (userp2120.oracle.com. [156.151.31.85]) by mx.google.com with ESMTPS id x10-v6si11507374itf.119.2018.09.10.18.00.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 18:00:06 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.85 as permitted sender) client-ip=156.151.31.85; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=1lcxVSZj; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.85 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8B0xwJk056969; Tue, 11 Sep 2018 00:59:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=kE05DxU6fEOt0NEdrU+JKiVWLdV7pbXIFRnkj/LNuq8=; b=1lcxVSZjmgtI3vWfB1cDWcOv0qeHRC7wqLgaewuUA/UXXz00KRhSqh8gf5qkOZLrB5Q8 PGs/45bu3wM5R5euoIbbtq0ad9d265u/JNQCl7/VTMZGHjolC5tlPvvrLHHcw2xKXKhn VEbuvTAadAnY/bEtFmgI9qNI+fxkS8MXubB9Z4pAa6XyKB7MQbk8D16IdX6g0cp4dKls 7tSPaOJ4P/jVHEdFl45um7p95gFHD01yI0es2psZpkZ8/O8zX2385fQIqab/hIH+wKSQ 6FGs1QMGnDjC8Z7Vn5R7+RbdpEoXZ3Gf2+Icy7/5MP8n2YGWqcQP90oSQN5WEfjaDzxo dQ== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2120.oracle.com with ESMTP id 2mc72qh0fy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 00:59:58 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w8B0xvI1012576 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 00:59:57 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w8B0xtrE003315; Tue, 11 Sep 2018 00:59:55 GMT Received: from localhost.localdomain (/73.143.71.164) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Sep 2018 17:59:54 -0700 From: Daniel Jordan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Cc: aaron.lu@intel.com, ak@linux.intel.com, akpm@linux-foundation.org, dave.dice@oracle.com, dave.hansen@linux.intel.com, hannes@cmpxchg.org, levyossi@icloud.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mhocko@kernel.org, Pavel.Tatashin@microsoft.com, steven.sistare@oracle.com, tim.c.chen@intel.com, vdavydov.dev@gmail.com, ying.huang@intel.com Subject: [RFC PATCH v2 4/8] mm: introduce smp_list_del for concurrent list entry removals Date: Mon, 10 Sep 2018 20:59:45 -0400 Message-Id: <20180911005949.5635-1-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180911004240.4758-1-daniel.m.jordan@oracle.com> References: <20180911004240.4758-1-daniel.m.jordan@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9012 signatures=668708 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809110009 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Now that the LRU lock is a RW lock, lay the groundwork for fine-grained synchronization so that multiple threads holding the lock as reader can safely remove pages from an LRU at the same time. Add a thread-safe variant of list_del called smp_list_del that allows multiple threads to delete nodes from a list, and wrap this new list API in smp_del_page_from_lru to get the LRU statistics updates right. For bisectability's sake, call the new function only when holding lru_lock as writer. In the next patch, switch to taking it as reader. The algorithm is explained in detail in the comments. Yosef Lev conceived of the algorithm, and this patch is heavily based on an earlier version from him. Thanks to Dave Dice for suggesting the prefetch. Signed-off-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/list.h | 2 + include/linux/mm_inline.h | 28 +++++++ lib/Makefile | 2 +- lib/list.c | 158 ++++++++++++++++++++++++++++++++++++++ mm/swap.c | 3 +- 5 files changed, 191 insertions(+), 2 deletions(-) create mode 100644 lib/list.c diff --git a/include/linux/list.h b/include/linux/list.h index 4b129df4d46b..bb80fe9b48cf 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -47,6 +47,8 @@ static inline bool __list_del_entry_valid(struct list_head *entry) } #endif +extern void smp_list_del(struct list_head *entry); + /* * Insert a new entry between two known consecutive entries. * diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 10191c28fc04..335bb9ba6510 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -4,6 +4,7 @@ #include #include +#include /** * page_is_file_cache - should the page be on a file LRU or anon LRU? @@ -65,6 +66,33 @@ static __always_inline void del_page_from_lru_list(struct page *page, update_lru_size(lruvec, lru, page_zonenum(page), -hpage_nr_pages(page)); } +/** + * smp_del_page_from_lru_list - thread-safe del_page_from_lru_list + * @page: page to delete from the LRU + * @lruvec: vector of LRUs + * @lru: type of LRU list to delete from within the lruvec + * + * Requires lru_lock to be held, preferably as reader for greater concurrency + * with other LRU operations but writers are also correct. + * + * Holding lru_lock as reader, the only unprotected shared state is @page's + * lru links, which smp_list_del safely handles. lru_lock excludes other + * writers, and the atomics and per-cpu counters in update_lru_size serialize + * racing stat updates. + * + * Concurrent removal of adjacent pages is expected to be rare. In + * will-it-scale/page_fault1, the ratio of iterations of any while loop in + * smp_list_del to calls to that function was less than 0.009% (and 0.009% was + * an outlier on an oversubscribed 44 core system). + */ +static __always_inline void smp_del_page_from_lru_list(struct page *page, + struct lruvec *lruvec, + enum lru_list lru) +{ + smp_list_del(&page->lru); + update_lru_size(lruvec, lru, page_zonenum(page), -hpage_nr_pages(page)); +} + /** * page_lru_base_type - which LRU list type should a page be on? * @page: the page to test diff --git a/lib/Makefile b/lib/Makefile index ce20696d5a92..f0689480f704 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -40,7 +40,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \ gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \ - once.o refcount.o usercopy.o errseq.o bucket_locks.o + once.o refcount.o usercopy.o errseq.o bucket_locks.o list.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o diff --git a/lib/list.c b/lib/list.c new file mode 100644 index 000000000000..22188fc0316d --- /dev/null +++ b/lib/list.c @@ -0,0 +1,158 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2017, 2018 Oracle and/or its affiliates. All rights reserved. + * + * Authors: Yosef Lev + * Daniel Jordan + */ + +#include +#include + +/* + * smp_list_del is a variant of list_del that allows concurrent list removals + * under certain assumptions. The idea is to get away from overly coarse + * synchronization, such as using a lock to guard an entire list, which + * serializes all operations even though those operations might be happening on + * disjoint parts. + * + * If you want to use other functions from the list API concurrently, + * additional synchronization may be necessary. For example, you could use a + * rwlock as a two-mode lock, where readers use the lock in shared mode and are + * allowed to call smp_list_del concurrently, and writers use the lock in + * exclusive mode and are allowed to use all list operations. + */ + +/** + * smp_list_del - concurrent variant of list_del + * @entry: entry to delete from the list + * + * Safely removes an entry from the list in the presence of other threads that + * may try to remove adjacent entries. Uses the entry's next field and the + * predecessor entry's next field as locks to accomplish this. + * + * Assumes that no two threads may try to delete the same entry. This + * assumption holds, for example, if the objects on the list are + * reference-counted so that an object is only removed when its refcount falls + * to 0. + * + * @entry's next and prev fields are poisoned on return just as with list_del. + */ +void smp_list_del(struct list_head *entry) +{ + struct list_head *succ, *pred, *pred_reread; + + /* + * The predecessor entry's cacheline is read before it's written, so to + * avoid an unnecessary cacheline state transition, prefetch for + * writing. In the common case, the predecessor won't change. + */ + prefetchw(entry->prev); + + /* + * Step 1: Lock @entry E by making its next field point to its + * predecessor D. This prevents any thread from removing the + * predecessor because that thread will loop in its step 4 while + * E->next == D. This also prevents any thread from removing the + * successor F because that thread will see that F->prev->next != F in + * the cmpxchg in its step 3. Retry if the successor is being removed + * and has already set this field to NULL in step 3. + */ + succ = READ_ONCE(entry->next); + pred = READ_ONCE(entry->prev); + while (succ == NULL || cmpxchg(&entry->next, succ, pred) != succ) { + /* + * Reread @entry's successor because it may change until + * @entry's next field is locked. Reread the predecessor to + * have a better chance of publishing the right value and avoid + * entering the loop in step 2 while @entry is locked, + * but this isn't required for correctness because the + * predecessor is reread in step 2. + */ + cpu_relax(); + succ = READ_ONCE(entry->next); + pred = READ_ONCE(entry->prev); + } + + /* + * Step 2: A racing thread may remove @entry's predecessor. Reread and + * republish @entry->prev until it does not change. This guarantees + * that the racing thread has not passed the while loop in step 4 and + * has not freed the predecessor, so it is safe for this thread to + * access predecessor fields in step 3. + */ + pred_reread = READ_ONCE(entry->prev); + while (pred != pred_reread) { + WRITE_ONCE(entry->next, pred_reread); + pred = pred_reread; + /* + * Ensure the predecessor is published in @entry's next field + * before rereading the predecessor. Pairs with the smp_mb in + * step 4. + */ + smp_mb(); + pred_reread = READ_ONCE(entry->prev); + } + + /* + * Step 3: If the predecessor points to @entry, lock it and continue. + * Otherwise, the predecessor is being removed, so loop until that + * removal finishes and this thread's @entry->prev is updated, which + * indicates the old predecessor has reached the loop in step 4. Write + * the new predecessor into @entry->next. This both releases the old + * predecessor from its step 4 loop and sets this thread up to lock the + * new predecessor. + */ + while (pred->next != entry || + cmpxchg(&pred->next, entry, NULL) != entry) { + /* + * The predecessor is being removed so wait for a new, + * unlocked predecessor. + */ + cpu_relax(); + pred_reread = READ_ONCE(entry->prev); + if (pred != pred_reread) { + /* + * The predecessor changed, so republish it and update + * it as in step 2. + */ + WRITE_ONCE(entry->next, pred_reread); + pred = pred_reread; + /* Pairs with smp_mb in step 4. */ + smp_mb(); + } + } + + /* + * Step 4: @entry and @entry's predecessor are both locked, so now + * actually remove @entry from the list. + * + * It is safe to write to the successor's prev pointer because step 1 + * prevents the successor from being removed. + */ + + WRITE_ONCE(succ->prev, pred); + + /* + * The full barrier guarantees that all changes are visible to other + * threads before the entry is unlocked by the final write, pairing + * with the implied full barrier before the cmpxchg in step 1. + * + * The barrier also guarantees that this thread writes succ->prev + * before reading succ->next, pairing with a thread in step 2 or 3 that + * writes entry->next before reading entry->prev, which ensures that + * the one that writes second sees the update from the other. + */ + smp_mb(); + + while (READ_ONCE(succ->next) == entry) { + /* The successor is being removed, so wait for it to finish. */ + cpu_relax(); + } + + /* Simultaneously completes the removal and unlocks the predecessor. */ + WRITE_ONCE(pred->next, succ); + + entry->next = LIST_POISON1; + entry->prev = LIST_POISON2; +} diff --git a/mm/swap.c b/mm/swap.c index a16ba5194e1c..613b841bd208 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -789,7 +789,8 @@ void release_pages(struct page **pages, int nr) lruvec = mem_cgroup_page_lruvec(page, locked_pgdat); VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, page_off_lru(page)); + smp_del_page_from_lru_list(page, lruvec, + page_off_lru(page)); } /* Clear Active bit in case of parallel mark_page_accessed */ From patchwork Tue Sep 11 00:59:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10594959 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BA41E6CB for ; Tue, 11 Sep 2018 01:00:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8E5E29010 for ; Tue, 11 Sep 2018 01:00:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 992B42901E; Tue, 11 Sep 2018 01:00:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2D7D329010 for ; Tue, 11 Sep 2018 01:00:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E40E8E0008; Mon, 10 Sep 2018 21:00:13 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 694008E0001; Mon, 10 Sep 2018 21:00:13 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55D508E0008; Mon, 10 Sep 2018 21:00:13 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by kanga.kvack.org (Postfix) with ESMTP id 2CDE58E0001 for ; Mon, 10 Sep 2018 21:00:13 -0400 (EDT) Received: by mail-it0-f69.google.com with SMTP id u195-v6so6059445ith.2 for ; Mon, 10 Sep 2018 18:00:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=GIGrhLpkq/UwRHuhh/oLny6MvSnAXVnSlsgDHYSyP4U=; b=pCpYEBu0VOQL8dY4bZW7krsksZIhlgP/37pfxn7GzP3mlHIdtzqzvRmGq386KrpjyW WqH47aMD81h5cVQuoTVFlL/Jb+sbMlsQw0wTk1wgloNV58g2/ssSJr2Oc6r0hhM+oKgQ Wm+4D28N+LrQF8DB01FsubYT8XQ88aUgdvOtmbi4Fkx+d/7be3xqjZIWHKgj1ID+X568 DXVHKCxEQoTEFyMe71s05qCmc83fXa9SIsgHS8veExJhzCWwyp8WgmCaIroLZphXcllF 46JIlytdd8KPXHz7GGXk0DQIojzsoxmLYRDKj49CHJqs4E19YH+WTlGmUSHC+u3UFkYC owwA== X-Gm-Message-State: APzg51Cc6u/AHWWWaNNOf258swOX/nxY8sNwrBIiBoHN61+mcNyhIa+m KncVGQ484NvFn8ijKDDN4yDQkQZbgYlBXtGTja2AvxhcwzwYWj7Yg7z9t8hSN/cXFpWtn3Rs6tW bJ6mQUT1qgWQI7yLOrAVbvhipSWj0fKuJSu58PCx1fBzuH7BFbrH3013NroGnLmjuOw== X-Received: by 2002:a6b:24a:: with SMTP id 71-v6mr18504118ioc.191.1536627612916; Mon, 10 Sep 2018 18:00:12 -0700 (PDT) X-Google-Smtp-Source: ANB0VdY0rIFuu7bRR7NEK2gYmvCi7WHh00BTY82yBfbYZeXr9Lez4HORam5k2ApxYGSB6Gzuu4jw X-Received: by 2002:a6b:24a:: with SMTP id 71-v6mr18504087ioc.191.1536627612253; Mon, 10 Sep 2018 18:00:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536627612; cv=none; d=google.com; s=arc-20160816; b=RgOmwHDjqX2jSpqRQvhfxvIxq5bl0agYsoEmvBRHK6UdKLyfGbYR3dyU64Q7yP2TU1 L5EFvDm5YMZf4MK+ElpDyQcFmM8+rpjE/uKkJdsj6e2x4BObK/Eof8ksckYE4GtZUhiQ UcbXE/kTq7GMlGB1D0os5vFeB5Q7ozCMZ0R23pRICnNsY4FCrpzpEY32FEKZdqWMzSQN fXqmAngnRSi15gcD1sIhnrXjopohc5hLo+2YwGJYWxCJHeg8Gey/CvK2sQiSXXOYErHa YGGL4mkDgIPmpDLBKLcRne3+gfOs1lkDjMiwjTTltzaOFnkuf6bLyL7s+4N5apynIRVK u2Gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=GIGrhLpkq/UwRHuhh/oLny6MvSnAXVnSlsgDHYSyP4U=; b=slib4aGzhJj/bhO88B5mJMKJZji1mtwTPnS9x05OAf2xTx9zb49d832XBU1i+2XN5k Fud2kEWw2dTKRBTTv5r+NQ+ZafcbNuohFU+nL4KSJtwmfq1lFaprDxSd0cudG5edwvdV WBovJhUWcex/PXci9UaGadJ2cMNYfAje9t+4aZ6DTsmyUYYWYmNFXgNy4UQjMHdLZMf9 KwJTCwMTnJvVS94i2dKD8wRR/y7ubKOj4WyrFeUcDvQ5+GVZXDKyEkixn6I1dtAmYLI6 8WdMLxKHj8W/hE0TNUi7Q75r3gFHTI9TJBVM2Ho4PSyA912YRFGG9V5rWntvUN4kzwEp +loQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=3OI9XnRy; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.78 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from aserp2120.oracle.com (aserp2120.oracle.com. [141.146.126.78]) by mx.google.com with ESMTPS id o186-v6si10518420ith.83.2018.09.10.18.00.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 18:00:12 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.78 as permitted sender) client-ip=141.146.126.78; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=3OI9XnRy; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.78 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8B103Rl100843; Tue, 11 Sep 2018 01:00:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=GIGrhLpkq/UwRHuhh/oLny6MvSnAXVnSlsgDHYSyP4U=; b=3OI9XnRydeGejeV8QwnfkqjVjJlEVTTMlsrePEqSQOp4Y9+cOHcgzFBVUPLTe7hH4VG8 de/MXWq1vMnQphx5FIDoOT4uZACs36ca8V1BVUnHkeMMNa3XwmVGXdTm0fl+uyO0nfNg He8/OjcPsPrUB0WIUZZ1f8ImYmmUv+wzyUBW7Dv3tRNEFLL+wwEJ20ujHgCATBJ9I53F mnzO7Z/KcdCSHNxoSwnBezQ7Eg/GfHI/OP+aA+M5DEUESD6TqwxBevbtVAq9C6dl5nJU Qkm34sbCV3bwoRuStRnX/I3qZSMzMh8fPlTctpw9vzGKzeLRuuj60WW1ictvvwTA04hv Ug== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2mc6cph2dv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 01:00:03 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w8B0xvuJ012610 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 00:59:58 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w8B0xvqe026769; Tue, 11 Sep 2018 00:59:57 GMT Received: from localhost.localdomain (/73.143.71.164) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Sep 2018 17:59:57 -0700 From: Daniel Jordan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Cc: aaron.lu@intel.com, ak@linux.intel.com, akpm@linux-foundation.org, dave.dice@oracle.com, dave.hansen@linux.intel.com, hannes@cmpxchg.org, levyossi@icloud.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mhocko@kernel.org, Pavel.Tatashin@microsoft.com, steven.sistare@oracle.com, tim.c.chen@intel.com, vdavydov.dev@gmail.com, ying.huang@intel.com Subject: [RFC PATCH v2 5/8] mm: enable concurrent LRU removals Date: Mon, 10 Sep 2018 20:59:46 -0400 Message-Id: <20180911005949.5635-2-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180911004240.4758-1-daniel.m.jordan@oracle.com> References: <20180911004240.4758-1-daniel.m.jordan@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9012 signatures=668708 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=928 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809110009 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The previous patch used the concurrent algorithm serially to see that it was stable for one task. Now in release_pages, take lru_lock as reader instead of writer to allow concurrent removals from one or more LRUs. Suggested-by: Yosef Lev Signed-off-by: Daniel Jordan --- mm/swap.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 613b841bd208..b1030eb7f459 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -737,8 +737,8 @@ void release_pages(struct page **pages, int nr) * same pgdat. The lock is held only if pgdat != NULL. */ if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { - write_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); + read_unlock_irqrestore(&locked_pgdat->lru_lock, + flags); locked_pgdat = NULL; } @@ -748,9 +748,8 @@ void release_pages(struct page **pages, int nr) /* Device public page can not be huge page */ if (is_device_public_page(page)) { if (locked_pgdat) { - write_unlock_irqrestore( - &locked_pgdat->lru_lock, - flags); + read_unlock_irqrestore(&locked_pgdat->lru_lock, + flags); locked_pgdat = NULL; } put_zone_device_private_or_public_page(page); @@ -763,9 +762,8 @@ void release_pages(struct page **pages, int nr) if (PageCompound(page)) { if (locked_pgdat) { - write_unlock_irqrestore( - &locked_pgdat->lru_lock, - flags); + read_unlock_irqrestore(&locked_pgdat->lru_lock, + flags); locked_pgdat = NULL; } __put_compound_page(page); @@ -776,14 +774,14 @@ void release_pages(struct page **pages, int nr) struct pglist_data *pgdat = page_pgdat(page); if (pgdat != locked_pgdat) { - if (locked_pgdat) { - write_unlock_irqrestore( - &locked_pgdat->lru_lock, flags); - } + if (locked_pgdat) + read_unlock_irqrestore( + &locked_pgdat->lru_lock, + flags); lock_batch = 0; locked_pgdat = pgdat; - write_lock_irqsave(&locked_pgdat->lru_lock, - flags); + read_lock_irqsave(&locked_pgdat->lru_lock, + flags); } lruvec = mem_cgroup_page_lruvec(page, locked_pgdat); @@ -800,7 +798,7 @@ void release_pages(struct page **pages, int nr) list_add(&page->lru, &pages_to_free); } if (locked_pgdat) - write_unlock_irqrestore(&locked_pgdat->lru_lock, flags); + read_unlock_irqrestore(&locked_pgdat->lru_lock, flags); mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); From patchwork Tue Sep 11 00:59:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10594957 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B1C686CB for ; Tue, 11 Sep 2018 01:00:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A02F329010 for ; Tue, 11 Sep 2018 01:00:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 939CF2901E; Tue, 11 Sep 2018 01:00:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 028A629010 for ; Tue, 11 Sep 2018 01:00:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BED758E0007; Mon, 10 Sep 2018 21:00:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B29FD8E0001; Mon, 10 Sep 2018 21:00:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97A218E0007; Mon, 10 Sep 2018 21:00:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yw1-f69.google.com (mail-yw1-f69.google.com [209.85.161.69]) by kanga.kvack.org (Postfix) with ESMTP id 65B488E0001 for ; Mon, 10 Sep 2018 21:00:09 -0400 (EDT) Received: by mail-yw1-f69.google.com with SMTP id d20-v6so14510853ywa.16 for ; Mon, 10 Sep 2018 18:00:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=Unlfp95VRYdh0zaKiyB62CoToWdwSBQEhoPttxdHTH0=; b=mi3JJZyd53O/B9nDw4WoPK17HQOb4+h1M5Dx99huqMxHzkqtnDEeJ2WjlfzxpP7zFq s+KUvSyZt9ygPJ7AjDgkdAcyLeOTgOPDhGJJkBQDIXLXubQghKIB+wmP93pR22scaSgi PfSTr/4kp8H3sp3Wz+vk9yORcu6F4On9Jrm4qLoOUbI7zS3qNoPIwcUc+izPAHfRZ6EJ /rjFF0nH5NIRXs4457kwB7B1gItoo+FRHrzX9QzrTXz5NFemiQ1HN9HdqKEcaYYFv5mg o/Zhf4e+oY1ZMWtj2yhVmQRIsLeP5YsQhMscpNMSWytjOc1L852yrvR8oeBkt8aTaA4D 4olA== X-Gm-Message-State: APzg51C1RYLx87+TH3t8aEtMq0KXSpea8EQoE2j0hfVvvwGYXE+MmG1Y 3NI561NWHCLc7TBeFNVrh5fSqldxie2rW71roROBfQtDjum5LRTRblTAycdDSdMrv3ISvvDpHp9 DFIhdxODJmTFoeHaioOyXU2WxfOa93L8ZJ4NB9Hv8PszH52eZ7QyTHP/Ev7N31mB86A== X-Received: by 2002:a25:b7d0:: with SMTP id u16-v6mr4168258ybj.52.1536627609105; Mon, 10 Sep 2018 18:00:09 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYAxDu3E7/XDYfosFgNlRJ6Tbb9/ksDntAvPuXnpGpxspRkd54hEacaQa3a/zbIjm7epPe1 X-Received: by 2002:a25:b7d0:: with SMTP id u16-v6mr4168246ybj.52.1536627608384; Mon, 10 Sep 2018 18:00:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536627608; cv=none; d=google.com; s=arc-20160816; b=spJs7Ouco5CBhtf7EhC6I1lPoZ4m/yNInsu7+ASPDTMA4ssqVeZP0txGMaNMUmJFNV mvkQe1Zriv2vvxy8oi2bC77rfvbGu+VdU4eJtKUKb+l+OtXGCCzNjI8C6xrwqy9GOdL/ k9TNzBOBoLkHSE/t1Qf9Ucmp8GEP0q0jrwjvaK2uvGs6DMnZ/7lC7yqhytC0ztOusWz2 f6Knlivk8sqoJM1R3iJ7yiqnI5Swf1H8XyFxVXgzA55aitpceTL89KhkyG9WlvGuL9tt 5rhBnw/FmorIJI5waybuVORBKrFJEa5CIyfVTUyudFsg+btmwnTfICCZeCseldZ9m5nL qNwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Unlfp95VRYdh0zaKiyB62CoToWdwSBQEhoPttxdHTH0=; b=OjXgY7vDJyTaaee+bn9Qs5YB4Hdzd7XdEAquQpG8N+N9OVMLth2IoerrHfdPeOanHX z1u4ItfNfBdjxdkDprzlRCdnnbN7G/TyVWRWQTlfufYIQTRieDTXNyRRmZyuUEvJ90s7 M47Cw/IpntlIPQXlzDmUt5Y3qy16EwBlAFl0euPSDcDsHhWFkcjo44bAyMq1GHJNk6mY SdNC4CYqVnGVFG/xSD2haw9bgLvOgBHWrj538F+VYv4kFBv9087QsSCPSjClQVzU8ixa Os7MZHyFtOsaLfW5B2O3KTpJPS4yxC4YLkaM1EaxWoZqxYbXKeIZLfwnkQjpqQr52OTR vcQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=u02JgrKp; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id f186-v6si4335184ywc.56.2018.09.10.18.00.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 18:00:08 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=u02JgrKp; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8B0xj2Y077213; Tue, 11 Sep 2018 01:00:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=Unlfp95VRYdh0zaKiyB62CoToWdwSBQEhoPttxdHTH0=; b=u02JgrKpOKciZh4jINL64Aaced2qCB6CUVl7FrYZy0RUveF/au1rTGvHhRN2hYrSsqOX mXpPg69Smv9wvL83jQoBnTjZJj4hZz253cFDZPKyfZCOtuCRENLSBnkuT+xeN5YHz2vQ B/N0a4cFKrl2nldUy5awBHcgytQ4KygmvGdPSD0Jz06RAKEYki474Cy771pcDccz4iWH MSB6ULzQLLW9EDbpw+ifT2JW/lD/3JeMGOYUME4df8C9o2RMIWfmihKrs9kdMArxIAh6 tcuiuuEPeSJOoNNvDbDwBSGKg8C8sQmtXFESclhP+bjKeedhWMGcts5RHE+aW6I8mV3p cA== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2mc5ut94ed-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 01:00:02 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w8B101Na005446 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 01:00:01 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w8B0xxqM020122; Tue, 11 Sep 2018 01:00:00 GMT Received: from localhost.localdomain (/73.143.71.164) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Sep 2018 17:59:59 -0700 From: Daniel Jordan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Cc: aaron.lu@intel.com, ak@linux.intel.com, akpm@linux-foundation.org, dave.dice@oracle.com, dave.hansen@linux.intel.com, hannes@cmpxchg.org, levyossi@icloud.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mhocko@kernel.org, Pavel.Tatashin@microsoft.com, steven.sistare@oracle.com, tim.c.chen@intel.com, vdavydov.dev@gmail.com, ying.huang@intel.com Subject: [RFC PATCH v2 6/8] mm: splice local lists onto the front of the LRU Date: Mon, 10 Sep 2018 20:59:47 -0400 Message-Id: <20180911005949.5635-3-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180911004240.4758-1-daniel.m.jordan@oracle.com> References: <20180911004240.4758-1-daniel.m.jordan@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9012 signatures=668708 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809110009 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The add-to-front LRU path currently adds one page at a time to the front of an LRU. This is slow when using the concurrent algorithm described in the next patch because the LRU head node will be locked for every page that's added. Instead, prepare local lists of pages, grouped by LRU, to be added to a given LRU in a single splice operation. The batching effect will reduce the amount of time that the LRU head is locked per page added. Signed-off-by: Daniel Jordan --- mm/swap.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 119 insertions(+), 4 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index b1030eb7f459..07b951727a11 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -865,8 +865,52 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, - void *arg) +#define MAX_LRU_SPLICES 4 + +struct lru_splice { + struct list_head list; + struct list_head *lru; + struct pglist_data *pgdat; +}; + +/* + * Adds a page to a local list for splicing, or else to the singletons + * list for individual processing. + * + * Returns the new number of splices in the splices list. + */ +static size_t add_page_to_splice(struct page *page, struct pglist_data *pgdat, + struct lru_splice *splices, size_t nr_splices, + struct list_head *singletons, + struct list_head *lru) +{ + int i; + + for (i = 0; i < nr_splices; ++i) { + if (splices[i].lru == lru) { + list_add(&page->lru, &splices[i].list); + return nr_splices; + } + } + + if (nr_splices < MAX_LRU_SPLICES) { + INIT_LIST_HEAD(&splices[nr_splices].list); + splices[nr_splices].lru = lru; + splices[nr_splices].pgdat = pgdat; + list_add(&page->lru, &splices[nr_splices].list); + ++nr_splices; + } else { + list_add(&page->lru, singletons); + } + + return nr_splices; +} + +static size_t pagevec_lru_add_splice(struct page *page, struct lruvec *lruvec, + struct pglist_data *pgdat, + struct lru_splice *splices, + size_t nr_splices, + struct list_head *singletons) { enum lru_list lru; int was_unevictable = TestClearPageUnevictable(page); @@ -916,8 +960,12 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, count_vm_event(UNEVICTABLE_PGCULLED); } - add_page_to_lru_list(page, lruvec, lru); + nr_splices = add_page_to_splice(page, pgdat, splices, nr_splices, + singletons, &lruvec->lists[lru]); + update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page)); trace_mm_lru_insertion(page, lru); + + return nr_splices; } /* @@ -926,7 +974,74 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, */ void __pagevec_lru_add(struct pagevec *pvec) { - pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL); + int i; + struct pglist_data *pagepgdat, *pgdat = NULL; + unsigned long flags = 0; + struct lru_splice splices[MAX_LRU_SPLICES]; + size_t nr_splices = 0; + LIST_HEAD(singletons); + struct page *page; + struct lruvec *lruvec; + enum lru_list lru; + + /* + * Sort the pages into local lists to splice onto the LRU. In the + * common case there should be few of these local lists. + */ + for (i = 0; i < pagevec_count(pvec); ++i) { + page = pvec->pages[i]; + pagepgdat = page_pgdat(page); + + /* + * Take lru_lock now so that setting PageLRU and setting the + * local list's links appear to happen atomically. + */ + if (pagepgdat != pgdat) { + if (pgdat) + write_unlock_irqrestore(&pgdat->lru_lock, flags); + pgdat = pagepgdat; + write_lock_irqsave(&pgdat->lru_lock, flags); + } + + lruvec = mem_cgroup_page_lruvec(page, pagepgdat); + + nr_splices = pagevec_lru_add_splice(page, lruvec, pagepgdat, + splices, nr_splices, + &singletons); + } + + for (i = 0; i < nr_splices; ++i) { + struct lru_splice *splice = &splices[i]; + + if (splice->pgdat != pgdat) { + if (pgdat) + write_unlock_irqrestore(&pgdat->lru_lock, flags); + pgdat = splice->pgdat; + write_lock_irqsave(&pgdat->lru_lock, flags); + } + list_splice(&splice->list, splice->lru); + } + + while (!list_empty(&singletons)) { + page = list_first_entry(&singletons, struct page, lru); + list_del(singletons.next); + pagepgdat = page_pgdat(page); + + if (pagepgdat != pgdat) { + if (pgdat) + write_unlock_irqrestore(&pgdat->lru_lock, flags); + pgdat = pagepgdat; + write_lock_irqsave(&pgdat->lru_lock, flags); + } + + lruvec = mem_cgroup_page_lruvec(page, pgdat); + lru = page_lru(page); + list_add(&page->lru, &lruvec->lists[lru]); + } + if (pgdat) + write_unlock_irqrestore(&pgdat->lru_lock, flags); + release_pages(pvec->pages, pvec->nr); + pagevec_reinit(pvec); } EXPORT_SYMBOL(__pagevec_lru_add); From patchwork Tue Sep 11 00:59:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10594961 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08C466CB for ; Tue, 11 Sep 2018 01:00:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ECD6B29010 for ; Tue, 11 Sep 2018 01:00:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E0FB22901E; Tue, 11 Sep 2018 01:00:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4656729010 for ; Tue, 11 Sep 2018 01:00:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C1638E0009; Mon, 10 Sep 2018 21:00:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 471288E0001; Mon, 10 Sep 2018 21:00:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 365D68E0009; Mon, 10 Sep 2018 21:00:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id 0DD168E0001 for ; Mon, 10 Sep 2018 21:00:16 -0400 (EDT) Received: by mail-it0-f72.google.com with SMTP id a10-v6so44770079itc.9 for ; Mon, 10 Sep 2018 18:00:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=p1afmEd8/tPqlTNMZIkHsm5+Y4w1VDdGqw9tYwO1YbM=; b=JQ0bTNgNAlIUbJawpjI8Df4Z7DDmZga2FGFnj+zhRLrYbbDKPqKa6x3pn3Gddt4SPC eHynsoVoGoMXfZHzVh458Ymq0YhnO8Zq9PbvEkBe1PhlO7b5+dIDq34uFUkuDOQcKhEa G3v/bl2WHos+q/CJbywrvy5yVwkGBf9NDJfxAeKLebEoEZoEfvaTqUBSoHwEu7IFQdcf wXD5PoXm2Bsdh7k8p+q99ERl0UX7Gr5Ej4UVgfRrmbrkSGFztDWpEoRg8Tku0LPgqF97 chlIQhHXp0eJ8GyQF1b2XAHTAZk7DQXvG5beMvTSQvVtOKoujb/DVyMuNPPFCJQXnplG uAOg== X-Gm-Message-State: APzg51BPbl56RJhPbWtwv7T4RtJAcVRtNSr5bgA3uxi1z5hu55y/fCSo 2xDlOfGOYRTbdOov6xEfDaE+do5uDIP2Kp2cAJKkyqLv5U8h24cZ61SausScQEFnQajliE6WLQm zNJomXrsXrnafcPcXDGhENkwfpNsSLpJX+QOBa8o3tuPvcrN6kgq3eG/5rKC5QuGwEw== X-Received: by 2002:a6b:e00d:: with SMTP id z13-v6mr18520294iog.70.1536627615789; Mon, 10 Sep 2018 18:00:15 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdaqs3c//MQBcrVfRUa5wlpS/zaIddyXHzp/FWlHMffkAr+nZan0fKGVzx1TLELdmp2eeGo6 X-Received: by 2002:a6b:e00d:: with SMTP id z13-v6mr18520246iog.70.1536627614873; Mon, 10 Sep 2018 18:00:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536627614; cv=none; d=google.com; s=arc-20160816; b=b6UTznpGwrfevjFYJkDR3gQvHc2TSHO95aLKJqCu/IgDeTUqX+f3iq/LiW4I1PMCD8 Xk7PYKu67ymij7Xp52y2RSkxaPj9BebIYiDP4Rr+vVDsUhaPYVvwBjRZWDeWREbdzvMq rC4b9WsYSTA7lW+dhjHmokQ0/tWXu8vNvFinvoWh1qwRRMGMDZ6mQcI0A9zcqIErxQ6v T8lnzhGA3XXFZEQ5OPDGXxEjyUgGJeqpw6oO3oLqfcg0j/sx7JGTwgKZDrrK0TOsRZaR rylyd4jlYTNOMKf12X+bl/CQLMXZ4VL0QXo736g+7v84KR+H3mCm/j7zs8Hkym8/zSuO k0FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=p1afmEd8/tPqlTNMZIkHsm5+Y4w1VDdGqw9tYwO1YbM=; b=jpDos5ARGNzAVSAyULWczr4L2HpEWFvt5YS1yyQLI8fkEgdKaOLKzQOnT248CY3lC4 YsuYvTyQ+y/mRGzc0bQgm6Gdsx5oQenzYXCyvGQvmTrybcHbkZ9wghf7Pn5RxWCcSjyL P5QS9AlgSk8leOrQcf7sgDr0vlGgoogmGmdDq9hXuMpwUdsI21N3pwNOGeFz1L3Zac8v 2JwK/CVO0o2EIFUZuDmxAl+Cv0GeKFlbIebK5uXQfODSal0a9XaAkZ4I8KLWTzj3nhDU FGiMOqhEmRq00U14/0od4E1AGti0Vqd7JNvANm+2OWPPWdEGHF/UxT8FvJELjfriENxj huUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=MfmVbIa9; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id o10-v6si11587509iod.271.2018.09.10.18.00.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 18:00:14 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=MfmVbIa9; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8B0xxnK077547; Tue, 11 Sep 2018 01:00:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=p1afmEd8/tPqlTNMZIkHsm5+Y4w1VDdGqw9tYwO1YbM=; b=MfmVbIa9FDvmDf90G0SzRodDhj46UW5c5w3fwmSapXN0FNOooe0U0SG10AUylxX4sB2A 30o3Fsn7Z2vMF6hF5cP4nde0+DZtczlOg9h6mvK12fB5v52b4b8IsGtgFXMbpZuBeqxE BPiBh0UNidC0UXAelzvyR4zl79H6uOqFq2lH6MDRVg1IzqK6NNyYHxn3l/zpxDmWsJB9 elAcVjYCaWgQWq40nYYtdT2Qr+UaWCrE135gnAQnXuMA+qIsiFtcUZhGkKclpSoMbFkj DryN8jGtrRDpyuC30oZADxiEXl6lU9uSQfJ3K7Wx6YV32tvsTQNFX2kelYgPZvr+7AFD fQ== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2130.oracle.com with ESMTP id 2mc5ut94eh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 01:00:02 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w8B101cX029692 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 01:00:02 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w8B1018I026802; Tue, 11 Sep 2018 01:00:01 GMT Received: from localhost.localdomain (/73.143.71.164) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Sep 2018 18:00:01 -0700 From: Daniel Jordan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Cc: aaron.lu@intel.com, ak@linux.intel.com, akpm@linux-foundation.org, dave.dice@oracle.com, dave.hansen@linux.intel.com, hannes@cmpxchg.org, levyossi@icloud.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mhocko@kernel.org, Pavel.Tatashin@microsoft.com, steven.sistare@oracle.com, tim.c.chen@intel.com, vdavydov.dev@gmail.com, ying.huang@intel.com Subject: [RFC PATCH v2 7/8] mm: introduce smp_list_splice to prepare for concurrent LRU adds Date: Mon, 10 Sep 2018 20:59:48 -0400 Message-Id: <20180911005949.5635-4-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180911004240.4758-1-daniel.m.jordan@oracle.com> References: <20180911004240.4758-1-daniel.m.jordan@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9012 signatures=668708 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809110009 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Now that we splice a local list onto the LRU, prepare for multiple tasks doing this concurrently by adding a variant of the kernel's list splicing API, list_splice, that's designed to work with multiple tasks. Although there is naturally less parallelism to be gained from locking the LRU head this way, the main benefit of doing this is to allow removals to happen concurrently. The way lru_lock is today, an add needlessly blocks removal of any page but the first in the LRU. For now, hold lru_lock as writer to serialize the adds to ensure the function is correct for a single thread at a time. Yosef Lev came up with this algorithm. Suggested-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/list.h | 1 + lib/list.c | 60 ++++++++++++++++++++++++++++++++++++++------ mm/swap.c | 3 ++- 3 files changed, 56 insertions(+), 8 deletions(-) diff --git a/include/linux/list.h b/include/linux/list.h index bb80fe9b48cf..6d964ea44f1a 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -48,6 +48,7 @@ static inline bool __list_del_entry_valid(struct list_head *entry) #endif extern void smp_list_del(struct list_head *entry); +extern void smp_list_splice(struct list_head *list, struct list_head *head); /* * Insert a new entry between two known consecutive entries. diff --git a/lib/list.c b/lib/list.c index 22188fc0316d..d6a834ef1543 100644 --- a/lib/list.c +++ b/lib/list.c @@ -10,17 +10,18 @@ #include /* - * smp_list_del is a variant of list_del that allows concurrent list removals - * under certain assumptions. The idea is to get away from overly coarse - * synchronization, such as using a lock to guard an entire list, which - * serializes all operations even though those operations might be happening on - * disjoint parts. + * smp_list_del and smp_list_splice are variants of list_del and list_splice, + * respectively, that allow concurrent list operations under certain + * assumptions. The idea is to get away from overly coarse synchronization, + * such as using a lock to guard an entire list, which serializes all + * operations even though those operations might be happening on disjoint + * parts. * * If you want to use other functions from the list API concurrently, * additional synchronization may be necessary. For example, you could use a * rwlock as a two-mode lock, where readers use the lock in shared mode and are - * allowed to call smp_list_del concurrently, and writers use the lock in - * exclusive mode and are allowed to use all list operations. + * allowed to call smp_list_* functions concurrently, and writers use the lock + * in exclusive mode and are allowed to use all list operations. */ /** @@ -156,3 +157,48 @@ void smp_list_del(struct list_head *entry) entry->next = LIST_POISON1; entry->prev = LIST_POISON2; } + +/** + * smp_list_splice - thread-safe splice of two lists + * @list: the new list to add + * @head: the place to add it in the first list + * + * Safely handles concurrent smp_list_splice operations onto the same list head + * and concurrent smp_list_del operations of any list entry except @head. + * Assumes that @head cannot be removed. + */ +void smp_list_splice(struct list_head *list, struct list_head *head) +{ + struct list_head *first = list->next; + struct list_head *last = list->prev; + struct list_head *succ; + + /* + * Lock the front of @head by replacing its next pointer with NULL. + * Should another thread be adding to the front, wait until it's done. + */ + succ = READ_ONCE(head->next); + while (succ == NULL || cmpxchg(&head->next, succ, NULL) != succ) { + cpu_relax(); + succ = READ_ONCE(head->next); + } + + first->prev = head; + last->next = succ; + + /* + * It is safe to write to succ, head's successor, because locking head + * prevents succ from being removed in smp_list_del. + */ + succ->prev = last; + + /* + * Pairs with the implied full barrier before the cmpxchg above. + * Ensures the write that unlocks the head is seen last to avoid list + * corruption. + */ + smp_wmb(); + + /* Simultaneously complete the splice and unlock the head node. */ + WRITE_ONCE(head->next, first); +} diff --git a/mm/swap.c b/mm/swap.c index 07b951727a11..fe3098c09815 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "internal.h" @@ -1019,7 +1020,7 @@ void __pagevec_lru_add(struct pagevec *pvec) pgdat = splice->pgdat; write_lock_irqsave(&pgdat->lru_lock, flags); } - list_splice(&splice->list, splice->lru); + smp_list_splice(&splice->list, splice->lru); } while (!list_empty(&singletons)) { From patchwork Tue Sep 11 00:59:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10594963 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D7AB914E5 for ; Tue, 11 Sep 2018 01:00:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C73AE29019 for ; Tue, 11 Sep 2018 01:00:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB4F329020; Tue, 11 Sep 2018 01:00:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B50C29019 for ; Tue, 11 Sep 2018 01:00:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA9F78E000E; Mon, 10 Sep 2018 21:00:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E5B678E0001; Mon, 10 Sep 2018 21:00:20 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFC6B8E000E; Mon, 10 Sep 2018 21:00:20 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by kanga.kvack.org (Postfix) with ESMTP id A4DCE8E0001 for ; Mon, 10 Sep 2018 21:00:20 -0400 (EDT) Received: by mail-yb1-f198.google.com with SMTP id e126-v6so11243329ybb.3 for ; Mon, 10 Sep 2018 18:00:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=AwKU9D2DXjht3CNnkLsSPDqQZuiwRTaiUp/PfPjCECA=; b=W3WwdrwFoQWEn+VRqmYE7omV0HvRiHz6v6/oW1p+8X/mfhmjYMmXcGlQ7a3epkOVfS oPtMKIz6RszgLWwABztjkUWVcHr4Gtz7P3AK7GHtd1RBoYb/XUj5rd+bfNeZPdHxPP4o ghXVU52sy3Ap8oXE8P6voUeGCMqr8xGSQ33Kqic/wu8vxhgfFVpX1qSyd/+lL40arqcB X2GwY2T/23ATrdU+Tc/6cLEc9zHQ13bm3zBfrEmYnSIJm+kqYgk0iUcaZ6xfdGtf7N5x AIGYl1gPsuggVmOV8FhoRGUl3DnuXDYdX9ULAKGsJ6F7Z6sbRLYdJlCE3en9s7tPIEXg ZeBQ== X-Gm-Message-State: APzg51DobphdvmZXfzQC1sB6KFadGhoOk7a5vV0VhSLCchrCi/L6qxkP ZaDmRseiGGrSjoS8TjTED+xs5KonYJuHTEtDmm4rl4BYIjceTGiPtYSo830jWjbTJ9RWroGWrXZ mqhWtS2V1XBdMFYuDL7gtsr1CgM8Aa/JRtPUE6Q2x8EjXt9CHDrEelBnHSzinixm/4A== X-Received: by 2002:a25:9111:: with SMTP id v17-v6mr11664560ybl.120.1536627620417; Mon, 10 Sep 2018 18:00:20 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYqftAOlT+g05cowslByUlwqS+cq3nj/s2v1wqnWIpL2gtxsRAOnlQRvm+lIyYud0UH+ork X-Received: by 2002:a25:9111:: with SMTP id v17-v6mr11664538ybl.120.1536627619492; Mon, 10 Sep 2018 18:00:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536627619; cv=none; d=google.com; s=arc-20160816; b=JycOcFp+qKb//+BXxdf10epaFwpHpmhr60Ujw1B/HkzeSEH2OTNc2+LVENKvRr4j8o zOS7If/wRYOrM28lkvXg+k/fKHGqL7rXG+Ma57HrrOFnBkMfKh5tCW3eU1CG0BVcGSy1 nmTP181VIaZcXT2O0BbJlszOHn9gFUfu2Kaihd+DBkSrEBKno/dZarnz0Z3PrFv4ALQf rrYmU2KN6lXEgxzB0t+Gcd9A7XiZNXxGjzVTGENEqNRe6ije+Q3o6bs/4cB8Tb+rHP2V DFNldV/SGEkEOgVSlyLLcNQjz1OUYJAX6SBY2/aiLt/2IiqTwL1WhPNAx9qo+iHIWNux N4Sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AwKU9D2DXjht3CNnkLsSPDqQZuiwRTaiUp/PfPjCECA=; b=gCclIK1h+URu7IOH+3lFax6h/FFj71X7QvkT5abY/Ge37aAbIwCdzMJtsSZVSv7dGT jGtIJ7B/v1hbLBx+qSFiUcPn4bPA5fu8OvZ5BKo8FkGoQpbsmWTx2VDm1ERzVc/qxzoR 7M8RI4i0Vb1LCUk8StJU4OSu5EGDqYLOko63UQuQDD9L1lPFclsPQ+/Dy66iBT0qZNkK 0Lts2wvwwYQdK3k0cWqxDmhJ6rSo0LDlaGh/MrrwGIIwDZq0EkLaitZGQrR+w/cU5kY3 0Ueqbc0mJlBGf0apFokjfpRdAXv1uTfZ/Fnh1gjAT4Z6v4yi5B0xRyX5mxho6RPN/jCZ 3vEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=XowBvb3w; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id o14-v6si4000423ybe.678.2018.09.10.18.00.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 18:00:19 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=XowBvb3w; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8B10AU2077920; Tue, 11 Sep 2018 01:00:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=AwKU9D2DXjht3CNnkLsSPDqQZuiwRTaiUp/PfPjCECA=; b=XowBvb3w4t3VICbk3/jLlxwdY79KT58Jn+A1CG7G2zjjqHd7SajwL523S89WDF807jiE F3tVjJE6B/DQsxJ8x353mkFCpY9rTfDQvBF9u+kpoeD093vmyw79SA6QWDp97UXCbYgv +ZKHuUw2NrnCaHsaVucqMdtDrgwy/7WpOMT+jPV+yvrBxDHC8qpZC0vq0xD3H4pSFrTE tN/FDgjT4JZTTLTglzcN+eFhq4+cUDg36nR0VneX0BrOggnJd4HidnAmuoQCdc6gZ6jn sW1gc9DxC6fTF6s0N38R+/1NlMMY5uFd7ObkNQaj45zzswV8b1K5lvB6/lo+X7f1saZT 2A== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2mc5ut94fq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 01:00:09 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w8B104KL006318 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Sep 2018 01:00:04 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w8B103xP003465; Tue, 11 Sep 2018 01:00:03 GMT Received: from localhost.localdomain (/73.143.71.164) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Sep 2018 18:00:03 -0700 From: Daniel Jordan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Cc: aaron.lu@intel.com, ak@linux.intel.com, akpm@linux-foundation.org, dave.dice@oracle.com, dave.hansen@linux.intel.com, hannes@cmpxchg.org, levyossi@icloud.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mhocko@kernel.org, Pavel.Tatashin@microsoft.com, steven.sistare@oracle.com, tim.c.chen@intel.com, vdavydov.dev@gmail.com, ying.huang@intel.com Subject: [RFC PATCH v2 8/8] mm: enable concurrent LRU adds Date: Mon, 10 Sep 2018 20:59:49 -0400 Message-Id: <20180911005949.5635-5-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180911004240.4758-1-daniel.m.jordan@oracle.com> References: <20180911004240.4758-1-daniel.m.jordan@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9012 signatures=668708 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809110009 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Switch over to holding lru_lock as reader when splicing pages onto the front of an LRU. The main benefit of doing this is to allow LRU adds and removes to happen concurrently. Before this patch, an add blocks all removing threads. Suggested-by: Yosef Lev Signed-off-by: Daniel Jordan --- mm/swap.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index fe3098c09815..ccd82ef3c217 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -999,9 +999,9 @@ void __pagevec_lru_add(struct pagevec *pvec) */ if (pagepgdat != pgdat) { if (pgdat) - write_unlock_irqrestore(&pgdat->lru_lock, flags); + read_unlock_irqrestore(&pgdat->lru_lock, flags); pgdat = pagepgdat; - write_lock_irqsave(&pgdat->lru_lock, flags); + read_lock_irqsave(&pgdat->lru_lock, flags); } lruvec = mem_cgroup_page_lruvec(page, pagepgdat); @@ -1016,12 +1016,16 @@ void __pagevec_lru_add(struct pagevec *pvec) if (splice->pgdat != pgdat) { if (pgdat) - write_unlock_irqrestore(&pgdat->lru_lock, flags); + read_unlock_irqrestore(&pgdat->lru_lock, flags); pgdat = splice->pgdat; - write_lock_irqsave(&pgdat->lru_lock, flags); + read_lock_irqsave(&pgdat->lru_lock, flags); } smp_list_splice(&splice->list, splice->lru); } + if (pgdat) { + read_unlock_irqrestore(&pgdat->lru_lock, flags); + pgdat = NULL; + } while (!list_empty(&singletons)) { page = list_first_entry(&singletons, struct page, lru);