From patchwork Tue Sep 11 05:36:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595061 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0503114E0 for ; Tue, 11 Sep 2018 05:36:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9A602902E for ; Tue, 11 Sep 2018 05:36:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC2E4292AD; Tue, 11 Sep 2018 05:36:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8599B2902E for ; Tue, 11 Sep 2018 05:36:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFBD68E0004; Tue, 11 Sep 2018 01:36:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AACAD8E0001; Tue, 11 Sep 2018 01:36:24 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C1F08E0004; Tue, 11 Sep 2018 01:36:24 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 68F9F8E0001 for ; Tue, 11 Sep 2018 01:36:24 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id 2-v6so10988286plc.11 for ; Mon, 10 Sep 2018 22:36:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=KiBecR2mBeb9TgBE2ZnhO+HWNsjOBVfORucjIZAEwtI=; b=p9XqxVGj5XrkIkwhBde1X+3tv2w6xxBGTToaXSSPNyrF26sUMTLBRrm1m9jIMvgG6M rrWtHhw5KjYpcl9ROcsBwu7kB1JJKf48xYMFf+yQIHsD1sqggdcmrEQU5gWthR6p2ZPj t0WgtPkm3u0uZvQeFnGOTjy1s46uvPIWkrGNHzvUeXYQ+pNEU2nwSd3KF7Q7Fgb/gwfD 5AG4UYUvA4VQymFVjD0nX3+S05w9sBDKONBfW/3t2WG8/EjNLkVlkyoHInao4LLUB4ph aZeJ1FFqEGb49zqNZXETcIc1ZZb6lEfnEeY45u2gKD8V80DQeL4nhZvbFFa5FKuK+zJv 8/oQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51CBoCWCX82BlD0FBVM5hnG1EX0kX6veX7anaei/aNe3qeeKejNt CbgfrrrPLP2aK4XvpCHLHnItcefuuYnzwLcHSPdIJiP7XTyek+XkB75Rq5VgYSisBkFUeqwUhhA HgB8+XNPcVuWhqO/a0vNoJsr/KQuG1PR4/FFN9j8j3Y7JiBmQXVrbcb/Hsf5H0tgifg== X-Received: by 2002:a62:b0e:: with SMTP id t14-v6mr26532020pfi.36.1536644184139; Mon, 10 Sep 2018 22:36:24 -0700 (PDT) X-Google-Smtp-Source: ANB0VdY8D9roxJh5tE+07y9vUFmOnDbZy8YU5aAjgpvKeVa6AkBTR2+s548rwKpvyGhTAENbEzGj X-Received: by 2002:a62:b0e:: with SMTP id t14-v6mr26531955pfi.36.1536644183442; Mon, 10 Sep 2018 22:36:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644183; cv=none; d=google.com; s=arc-20160816; b=PNPx55j3CBo7hebhr5C6jBaOZoRT7kcC99D9D2vZnH3/tw/67JCktiU50uZxaAvKTq 7de7WBXQX6raPJqbp8JFcl9B9QLNHvfPpPlQQRJJhCwL35SjsxMgLJR7BDlfoUTq3qxr 291e/bRXmBKPHLdjMq5W57G47toOw5tKBJXJpwzeDeIiAdnjQ8z8ZlFM6Ga1Dc+K8Cp1 iBrwFUDBSxJb3wg1ZfajbUEZwz+dLC1lmtR7XNpBp6ECxA5IQbCkiqEe7A6GCm+JUdFf 1vZvGsUNUl0DUqhCpkZqwrw7fmU7OLNm1aa9BwDp8y5hRqWw4uicHo44iBRqAjVXNqLj DoSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=KiBecR2mBeb9TgBE2ZnhO+HWNsjOBVfORucjIZAEwtI=; b=Koq97BAwONHeaLxxRkIKcR9gsKJEa+qbh1qW7hwX1E+/6p31fgFVswUmEhGjcGjhlH eLhZcNUNy8DxXpgqkvK4dz4ZD5Lo4NDD3aL4fXTvUb5UEmOk+u7Mz0OBSxUvHKYpgYBR Qsjp3YeRaDPeHfHeopOynJzrIURz35FOQxiuwmc72t6gLE/iKfsR1ugvH92kJX8FvuPU WY6+ymAh6PUi2Zo/M2/edbsGq45THqHa3yDJkBI2w30YiE6IIrHR/pslYH96J8nepcrP sXkyc02E2EDc2Z94l3HbjFsZ1GEwpyc8ts+wK6OhiyXOhyUD2N+cm5c7AXXevDWTBmzp zQuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:23 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426314" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:20 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 1/9] mm: do not add anon pages to LRU Date: Tue, 11 Sep 2018 13:36:08 +0800 Message-Id: <20180911053616.6894-2-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For the sake of testing purpose, do not add anon pages to LRU to avoid LRU lock so we can test zone lock exclusively. Signed-off-by: Aaron Lu --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index c467102a5cbc..080641255b8b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3208,7 +3208,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); page_add_new_anon_rmap(page, vma, vmf->address, false); mem_cgroup_commit_charge(page, memcg, false, false); - lru_cache_add_active_or_unevictable(page, vma); + //lru_cache_add_active_or_unevictable(page, vma); setpte: set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); From patchwork Tue Sep 11 05:36:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595063 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE8D7921 for ; Tue, 11 Sep 2018 05:36:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDAAD2902E for ; Tue, 11 Sep 2018 05:36:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D1704292AD; Tue, 11 Sep 2018 05:36:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 127382902E for ; Tue, 11 Sep 2018 05:36:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 499A78E0005; Tue, 11 Sep 2018 01:36:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3F54A8E0001; Tue, 11 Sep 2018 01:36:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30D148E0005; Tue, 11 Sep 2018 01:36:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id DEBC58E0001 for ; Tue, 11 Sep 2018 01:36:27 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id j15-v6so12358482pfi.10 for ; Mon, 10 Sep 2018 22:36:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ErKSJHlV3UP3Qfuq7HvkzgnR2NQbuQmKWUVJArRileo=; b=Z3e0MHdpvaXbltajbLq/EXFp3GqpyytlVhbP6QnFePgzNZWqGUG4eWY5tSDK5PHOTa SvjRdalunAjx4AmsAxT6PL2vfoD3s74Y1hw3r6a79OzkRH/hOO9nV72DkXbUlMwqGu21 y0vBM2fYEO6Slmn88VYkJcz6tpEDEOGg4dTlTcli+SRmlKRG/5wuXepRRfDInypyl0bU lZNVC3fqx36IUaCIJ58ipQxpyMzTvOEIBPp+Bk9fAU7uTtmANIxXnhV6oRNcz2ZEKHtr OYneh4Y1JG0dc0tMXRDDlko87PKVaugBLruo4wi/vi2o3Ca1Qxg6alnzReoNP/be8yxa eNiQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51BVJUi4jsMFZnRWEtvurLQPPfF7/QZE3qvMTHi5LCNZUoeIu3fr qgZBl/hS6PYAi0kBhz/F1sttHl8SvtT63IjdxKnN2GfwUp9CHf6rpQTM76+AiOTGFQ8GW189D2f q18BtlsgSZ5DunvJ2PM24igo3rlP6wPBYvYU1rosokQxcKUGyRwVW2igTHBgQ3d87Eg== X-Received: by 2002:a63:d90b:: with SMTP id r11-v6mr26456670pgg.315.1536644187540; Mon, 10 Sep 2018 22:36:27 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYQQ6LIeYo+iFozebIIcOa1MRrnKseNsmXeeB3u9+phySgtt9gD4hmMYkmB+Rvg/JnAAaVa X-Received: by 2002:a63:d90b:: with SMTP id r11-v6mr26456600pgg.315.1536644186409; Mon, 10 Sep 2018 22:36:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644186; cv=none; d=google.com; s=arc-20160816; b=OR8AQdjE5uevQ5xStS6Wcg1Rvt6h5vUNqS3pll3A39HvlzUMW0yez0O4W88JIagABc MndYVDx+6QSJaRJFIqGyWi8Vz7zgObKt5PsXxh2Y8vnhs6r2GytyBHHL91poOIixPbG3 WELPK2PecXO+Oat7OAEUOw2i8HRsaW4wygHqZXr82w37/GtNhZlE47pl1QZUd2F3ekKF fTjmCUGaaNWijMwhD4gFk8mmMtWiPjI5gHhmgeMIIHBcoZyiu4l0lEmXjHKf+x6D2xmI s/2zm2D54s1+fJCPwMim1OZwltevlVW2UstbX5CE2tlWa5L3jvNNOXoHClV4ojIwFhyj 5ixw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ErKSJHlV3UP3Qfuq7HvkzgnR2NQbuQmKWUVJArRileo=; b=uJ5aZ8Ys9Pbx6yDxKeCogSex/fzwdCmpjUFou1YG7pn0iiBF75q0iHKb5M68bU9hoo wRToqN7KuyILGuqT/PfqqH4XJ/NTtSFUyI9pk3SAu09x3YwUoqkUiLUFSCCnO6YENBT0 Mq0Fca7hHU0DFxNdYwXPgCJP+7ZWwd/IYlpCkBNjGzVgwiGmEy0REZoPdUT3Mxk5SP/S MjHa7nhrAZP+Wq8A4VWf/yn+xd23+qxkouxBW25ro48JZYanksz3B5YWPZw+7co66hXY zz8xyG8pNb3fgwmPVzHUh0otS3Zgi3UG0bTIzMuyMF2X02fIuBGEZCGEmX8013zaGWIh QE1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:26 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426318" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:23 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 2/9] mm: introduce smp_list_del for concurrent list entry removals Date: Tue, 11 Sep 2018 13:36:09 +0800 Message-Id: <20180911053616.6894-3-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Daniel Jordan Now that the LRU lock is a RW lock, lay the groundwork for fine-grained synchronization so that multiple threads holding the lock as reader can safely remove pages from an LRU at the same time. Add a thread-safe variant of list_del called smp_list_del that allows multiple threads to delete nodes from a list, and wrap this new list API in smp_del_page_from_lru to get the LRU statistics updates right. For bisectability's sake, call the new function only when holding lru_lock as writer. In the next patch, switch to taking it as reader. The algorithm is explained in detail in the comments. Yosef Lev conceived of the algorithm, and this patch is heavily based on an earlier version from him. Thanks to Dave Dice for suggesting the prefetch. [aaronlu: only take list related code here] Signed-off-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/list.h | 2 + lib/Makefile | 2 +- lib/list.c | 158 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 161 insertions(+), 1 deletion(-) create mode 100644 lib/list.c diff --git a/include/linux/list.h b/include/linux/list.h index de04cc5ed536..0fd9c87dd14b 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -47,6 +47,8 @@ static inline bool __list_del_entry_valid(struct list_head *entry) } #endif +extern void smp_list_del(struct list_head *entry); + /* * Insert a new entry between two known consecutive entries. * diff --git a/lib/Makefile b/lib/Makefile index ca3f7ebb900d..9527b7484653 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -38,7 +38,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \ gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o reciprocal_div.o \ - once.o refcount.o usercopy.o errseq.o bucket_locks.o + once.o refcount.o usercopy.o errseq.o bucket_locks.o list.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o diff --git a/lib/list.c b/lib/list.c new file mode 100644 index 000000000000..4d0949ea1a09 --- /dev/null +++ b/lib/list.c @@ -0,0 +1,158 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (c) 2017, 2018 Oracle and/or its affiliates. All rights reserved. + * + * Authors: Yosef Lev + * Daniel Jordan + */ + +#include +#include + +/* + * smp_list_del is a variant of list_del that allows concurrent list removals + * under certain assumptions. The idea is to get away from overly coarse + * synchronization, such as using a lock to guard an entire list, which + * serializes all operations even though those operations might be happening on + * disjoint parts. + * + * If you want to use other functions from the list API concurrently, + * additional synchronization may be necessary. For example, you could use a + * rwlock as a two-mode lock, where readers use the lock in shared mode and are + * allowed to call smp_list_del concurrently, and writers use the lock in + * exclusive mode and are allowed to use all list operations. + */ + +/** + * smp_list_del - concurrent variant of list_del + * @entry: entry to delete from the list + * + * Safely removes an entry from the list in the presence of other threads that + * may try to remove adjacent entries. Uses the entry's next field and the + * predecessor entry's next field as locks to accomplish this. + * + * Assumes that no two threads may try to delete the same entry. This + * assumption holds, for example, if the objects on the list are + * reference-counted so that an object is only removed when its refcount falls + * to 0. + * + * @entry's next and prev fields are poisoned on return just as with list_del. + */ +void smp_list_del(struct list_head *entry) +{ + struct list_head *succ, *pred, *pred_reread; + + /* + * The predecessor entry's cacheline is read before it's written, so to + * avoid an unnecessary cacheline state transition, prefetch for + * writing. In the common case, the predecessor won't change. + */ + prefetchw(entry->prev); + + /* + * Step 1: Lock @entry E by making its next field point to its + * predecessor D. This prevents any thread from removing the + * predecessor because that thread will loop in its step 4 while + * E->next == D. This also prevents any thread from removing the + * successor F because that thread will see that F->prev->next != F in + * the cmpxchg in its step 3. Retry if the successor is being removed + * and has already set this field to NULL in step 3. + */ + succ = READ_ONCE(entry->next); + pred = READ_ONCE(entry->prev); + while (succ == NULL || cmpxchg(&entry->next, succ, pred) != succ) { + /* + * Reread @entry's successor because it may change until + * @entry's next field is locked. Reread the predecessor to + * have a better chance of publishing the right value and avoid + * entering the loop in step 2 while @entry is locked, + * but this isn't required for correctness because the + * predecessor is reread in step 2. + */ + cpu_relax(); + succ = READ_ONCE(entry->next); + pred = READ_ONCE(entry->prev); + } + + /* + * Step 2: A racing thread may remove @entry's predecessor. Reread and + * republish @entry->prev until it does not change. This guarantees + * that the racing thread has not passed the while loop in step 4 and + * has not freed the predecessor, so it is safe for this thread to + * access predecessor fields in step 3. + */ + pred_reread = READ_ONCE(entry->prev); + while (pred != pred_reread) { + WRITE_ONCE(entry->next, pred_reread); + pred = pred_reread; + /* + * Ensure the predecessor is published in @entry's next field + * before rereading the predecessor. Pairs with the smp_mb in + * step 4. + */ + smp_mb(); + pred_reread = READ_ONCE(entry->prev); + } + + /* + * Step 3: If the predecessor points to @entry, lock it and continue. + * Otherwise, the predecessor is being removed, so loop until that + * removal finishes and this thread's @entry->prev is updated, which + * indicates the old predecessor has reached the loop in step 4. Write + * the new predecessor into @entry->next. This both releases the old + * predecessor from its step 4 loop and sets this thread up to lock the + * new predecessor. + */ + while (pred->next != entry || + cmpxchg(&pred->next, entry, NULL) != entry) { + /* + * The predecessor is being removed so wait for a new, + * unlocked predecessor. + */ + cpu_relax(); + pred_reread = READ_ONCE(entry->prev); + if (pred != pred_reread) { + /* + * The predecessor changed, so republish it and update + * it as in step 2. + */ + WRITE_ONCE(entry->next, pred_reread); + pred = pred_reread; + /* Pairs with smp_mb in step 4. */ + smp_mb(); + } + } + + /* + * Step 4: @entry and @entry's predecessor are both locked, so now + * actually remove @entry from the list. + * + * It is safe to write to the successor's prev pointer because step 1 + * prevents the successor from being removed. + */ + + WRITE_ONCE(succ->prev, pred); + + /* + * The full barrier guarantees that all changes are visible to other + * threads before the entry is unlocked by the final write, pairing + * with the implied full barrier before the cmpxchg in step 1. + * + * The barrier also guarantees that this thread writes succ->prev + * before reading succ->next, pairing with a thread in step 2 or 3 that + * writes entry->next before reading entry->prev, which ensures that + * the one that writes second sees the update from the other. + */ + smp_mb(); + + while (READ_ONCE(succ->next) == entry) { + /* The successor is being removed, so wait for it to finish. */ + cpu_relax(); + } + + /* Simultaneously completes the removal and unlocks the predecessor. */ + WRITE_ONCE(pred->next, succ); + + entry->next = LIST_POISON1; + entry->prev = LIST_POISON2; +} From patchwork Tue Sep 11 05:36:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595065 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 395A8921 for ; Tue, 11 Sep 2018 05:36:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 27D83292AD for ; Tue, 11 Sep 2018 05:36:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B29D292BB; Tue, 11 Sep 2018 05:36:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 821A1292AD for ; Tue, 11 Sep 2018 05:36:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9750F8E0006; Tue, 11 Sep 2018 01:36:30 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 927188E0001; Tue, 11 Sep 2018 01:36:30 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83B708E0006; Tue, 11 Sep 2018 01:36:30 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 445848E0001 for ; Tue, 11 Sep 2018 01:36:30 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id x85-v6so12373114pfe.13 for ; Mon, 10 Sep 2018 22:36:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=bzauRD9o+W09eWXP1DarQ026TVo0Oh5DmnYxjFzEkFI=; b=IQPOvJiBuCCpp7EZ6p8NX+lF3mqC/wZZAUpaRQ/5xLD5ByizysMAsWtpnr+N9Ldf5W LK07YrlZFI2NkCQC1CArG3RGTJ/Y214uqG/sD/w+8+3URRB3OWRZQcyPXbizDguDuMIX tzZej5iZdMxD/R8yiyTVQozfEctoEj+8flrZ841wJWUElaZnb937jtdAyvG4kx3dM6Rz EH88gFO8VR2aCge5X8LXRUw6zfq+Yl+jgTLWpYvWy/DxY5hbK8fYcuRKLvF8DC1T+tfM mBIHhnFhO/NULFXXO0fQcjSfbI9jP9vRo4hSe3XM8aaWPELQSobs02zx3PLUCjTPMj81 2UOA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DoL31T0rlXYgBjZdl57n5523pnj2hn32hbqrB8FzeN82ZdI7Px M+X9J0T6tsqKGjHQhQ7qBXsyDm/nin5hNzDDo2cQYJUTQQrSDcLOV/tCkVrfRsCZyTTKdiHGlgK oI51EzaebPDFinIg/NIL/V4rVHxBgOgnHOcVK8g87EKuJ1f5o3r4xmns5snvQ1w44zw== X-Received: by 2002:a63:7d48:: with SMTP id m8-v6mr26755738pgn.0.1536644189949; Mon, 10 Sep 2018 22:36:29 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbqV/BTaM5OWOZcDka1hFHcIpP9p62r/M8HKAFkKZgcSC9F9f3hZcrYXDcpFjxTCATk/SYi X-Received: by 2002:a63:7d48:: with SMTP id m8-v6mr26755668pgn.0.1536644188822; Mon, 10 Sep 2018 22:36:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644188; cv=none; d=google.com; s=arc-20160816; b=0DwSqIEjh0h7G0dkEeWY74NGBpOKJypWAThZf+6MXwBryXexJZlKzNm4KjHBIuNKE6 Mudr3buA339TKbndpxznBESCav6fY1I+QNHtoDdDQz6m2u/8nKRUHbVTyiO4kM00kDeO tB5ccb70i4ykKBuoHOv2gXEL1y8JCUlRe/lVK6AU5wRfesz43xXaX/vrxkMmkWjbj4X8 NpVV8gRktGl9+1uhkydqLLCWC2sYm47Q4JHYWjfDNW35bRI12kCUpJgX+N5yY2KlhxC0 Ly01dEyh6Vn/pozfFs+g1hAl1RUOneZCwbS8n8+0uWPAGBMWR6cAQcIg34wzOUSprH2G etVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=bzauRD9o+W09eWXP1DarQ026TVo0Oh5DmnYxjFzEkFI=; b=W3sp04ukX9TEpJVQ4U+il0dV4hErups8qAF0ry22S0S7dZwkbmNZ2spZgTHar4vJz2 syzmjnf7BB8iNQUPhe2oXlqtpXQDWNAPpb1svehn3u7r+sgZi5TCCBaphW1shB4bFij8 UTeJiUDX4I7qxhTkWhnIXuAMorD5efhFTlzgq8b5ZVQhjgaYVhIj7Mhsi6b2/YxI0M1O 1kjgeIJGjsOEMoX+8zJ7OmC3pJikiWXJbRlqs1YBRjixC4+UaX6caMstvqsqWVqnKqaL sW0TgO71e5r3J6Koufpw1IBDFvw8j7Z9QcrQfTB33W8wW2RRb0G8U9q5Bvsh38PJ8+u9 dPGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:28 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426326" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:26 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 3/9] mm: introduce smp_list_splice to prepare for concurrent LRU adds Date: Tue, 11 Sep 2018 13:36:10 +0800 Message-Id: <20180911053616.6894-4-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Daniel Jordan Now that we splice a local list onto the LRU, prepare for multiple tasks doing this concurrently by adding a variant of the kernel's list splicing API, list_splice, that's designed to work with multiple tasks. Although there is naturally less parallelism to be gained from locking the LRU head this way, the main benefit of doing this is to allow removals to happen concurrently. The way lru_lock is today, an add needlessly blocks removal of any page but the first in the LRU. For now, hold lru_lock as writer to serialize the adds to ensure the function is correct for a single thread at a time. Yosef Lev came up with this algorithm. [aaronlu: drop LRU related code, keep only list related code] Suggested-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/list.h | 1 + lib/list.c | 60 ++++++++++++++++++++++++++++++++++++++------ 2 files changed, 54 insertions(+), 7 deletions(-) diff --git a/include/linux/list.h b/include/linux/list.h index 0fd9c87dd14b..5f203fb55939 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -48,6 +48,7 @@ static inline bool __list_del_entry_valid(struct list_head *entry) #endif extern void smp_list_del(struct list_head *entry); +extern void smp_list_splice(struct list_head *list, struct list_head *head); /* * Insert a new entry between two known consecutive entries. diff --git a/lib/list.c b/lib/list.c index 4d0949ea1a09..104faa144abf 100644 --- a/lib/list.c +++ b/lib/list.c @@ -10,17 +10,18 @@ #include /* - * smp_list_del is a variant of list_del that allows concurrent list removals - * under certain assumptions. The idea is to get away from overly coarse - * synchronization, such as using a lock to guard an entire list, which - * serializes all operations even though those operations might be happening on - * disjoint parts. + * smp_list_del and smp_list_splice are variants of list_del and list_splice, + * respectively, that allow concurrent list operations under certain + * assumptions. The idea is to get away from overly coarse synchronization, + * such as using a lock to guard an entire list, which serializes all + * operations even though those operations might be happening on disjoint + * parts. * * If you want to use other functions from the list API concurrently, * additional synchronization may be necessary. For example, you could use a * rwlock as a two-mode lock, where readers use the lock in shared mode and are - * allowed to call smp_list_del concurrently, and writers use the lock in - * exclusive mode and are allowed to use all list operations. + * allowed to call smp_list_* functions concurrently, and writers use the lock + * in exclusive mode and are allowed to use all list operations. */ /** @@ -156,3 +157,48 @@ void smp_list_del(struct list_head *entry) entry->next = LIST_POISON1; entry->prev = LIST_POISON2; } + +/** + * smp_list_splice - thread-safe splice of two lists + * @list: the new list to add + * @head: the place to add it in the first list + * + * Safely handles concurrent smp_list_splice operations onto the same list head + * and concurrent smp_list_del operations of any list entry except @head. + * Assumes that @head cannot be removed. + */ +void smp_list_splice(struct list_head *list, struct list_head *head) +{ + struct list_head *first = list->next; + struct list_head *last = list->prev; + struct list_head *succ; + + /* + * Lock the front of @head by replacing its next pointer with NULL. + * Should another thread be adding to the front, wait until it's done. + */ + succ = READ_ONCE(head->next); + while (succ == NULL || cmpxchg(&head->next, succ, NULL) != succ) { + cpu_relax(); + succ = READ_ONCE(head->next); + } + + first->prev = head; + last->next = succ; + + /* + * It is safe to write to succ, head's successor, because locking head + * prevents succ from being removed in smp_list_del. + */ + succ->prev = last; + + /* + * Pairs with the implied full barrier before the cmpxchg above. + * Ensures the write that unlocks the head is seen last to avoid list + * corruption. + */ + smp_wmb(); + + /* Simultaneously complete the splice and unlock the head node. */ + WRITE_ONCE(head->next, first); +} From patchwork Tue Sep 11 05:36:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595067 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CCB1B14E0 for ; Tue, 11 Sep 2018 05:36:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA4A3292AD for ; Tue, 11 Sep 2018 05:36:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AD1CE292BB; Tue, 11 Sep 2018 05:36:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 98E36292AD for ; Tue, 11 Sep 2018 05:36:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E81A78E0007; Tue, 11 Sep 2018 01:36:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DDF768E0001; Tue, 11 Sep 2018 01:36:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCFDC8E0007; Tue, 11 Sep 2018 01:36:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 7B9F08E0001 for ; Tue, 11 Sep 2018 01:36:33 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id 33-v6so11055983plf.19 for ; Mon, 10 Sep 2018 22:36:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=4O70Xhie5CRoBrxcV+MjY32OisDasNHvRoGgOAziD+w=; b=Z/tR1VF+5Sgi5e00gfWwG1HazwoMULttluye/mHDS5jQgpY3oH2EkgVs39opDIaaAt DPkQundT5Uxcuw0hJfJ5FJr8lyLpkOKSpPs9cJlAkJtPehRIcwSA7K0LAWfgZ49N7jL1 tIlZ8XpuP/1fd21pZ/XXoaNJTxHxNF9BW4AEHhmLaCR84h/uQxd5EpFg7e2ozbVJGBgF X/vdlnJXD7ODnWp5FejjxVrynBN31QR79k+7tY/wcx/cY4sI2r0zJToA7wDubyK7RTPn +iqnIwrX2Zzek0Vl2DRc1Qk0Ftx29kWApINeTNyaqvG5gsphTfOYvUGJXkThDHuCUEkK 5MEw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51B/Of67lPrgZbhdoUllljD3q4tSub72xYModNjDmK5KX/aBbL9g oF0dj6pNBXhCvd1gA4HuJdm7B18t9fVxN3PpcoGDt2r9Mf4qHNba78dxN7amUWU19cgMKEZ5kLE jc2IJCkirRyxx2Zk8AhxRUVu9tknIhK8ytyy/Wz+LBXQTDG+GwkNZMkz9Om5/uMRJUQ== X-Received: by 2002:a63:5b63:: with SMTP id l35-v6mr26450707pgm.50.1536644193089; Mon, 10 Sep 2018 22:36:33 -0700 (PDT) X-Google-Smtp-Source: ANB0Vda/twWHD3S8O5RJsCUClT0tXnbbBmxsi1xVnUGHpZp5uS9+ynijTRTqDfLQwL7VaaYbkmsy X-Received: by 2002:a63:5b63:: with SMTP id l35-v6mr26450627pgm.50.1536644191789; Mon, 10 Sep 2018 22:36:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644191; cv=none; d=google.com; s=arc-20160816; b=gshQ1RxLPf6LZsAxyaF/oSo0U5MI7wAHKHtR79Qx5o52oX3kd1KPNRLhB6aJ40O9oJ UOdebwUaz9E+J2xvwl3hb97qEN8bQ2SqUJvOXOLT0lNYtWVL4wmn6RQDqOCS/E4LMa5/ I01cAr4xk3YvKwW+jwYonu3s+KINosaeKEil54aKtpcGEqcS53AVDO9arfd/9Zgs/t+C 1DILuiLhC33DkWuNwU7EDL3vWUVHBe8MoIErapThfhtM/CkcVSjgq+QDQp0s7gRmV3Lp D4GuJCu3kWRMfW2szTHHuBdDKEr5aKev1fALl6JMk3x0imfYC5sc58jFSboIx3jE9YGM u6bQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=4O70Xhie5CRoBrxcV+MjY32OisDasNHvRoGgOAziD+w=; b=rtKfXvaW8yZ/FT+Q6JI2RR3pnoE2vahOt1qi9V5yhWuJPBEk7F39fuUjUUcYKBixb9 wovtKqjqdCrfzbon9//+3juYofX73050CJ8XqTnPa1TC1xrIYn1tyZIwTyjZ2MW1sBUq w9fkjZgk/Rc4gmIfAFBd7qbG4CPgYmcW35SvFv04ltkvy42M73YWclSQvd5nN7hK3ctl JugPYcDE5IdZWn8v9d0sKPi/tB10rg9G+Az2BF7qqq3PtOGFFrXMY/wJUa/8qBpY6VpA FG7K/In/Dkr0f0jrqoFdbXT+18qhzaOWYzVPDVO3+uwXMrLTCdrFtCxbWxdMZh0rIWmX X+UQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:31 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426329" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:28 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 4/9] mm: convert zone lock from spinlock to rwlock Date: Tue, 11 Sep 2018 13:36:11 +0800 Message-Id: <20180911053616.6894-5-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch converts zone lock from spinlock to rwlock and always take the lock in write mode so there is no functionality change. This is a preparation for free path to take the lock in read mode to make free path work concurrently. compact_trylock and compact_unlock_should_abort are taken from Daniel Jordan's patch. Signed-off-by: Aaron Lu --- include/linux/mmzone.h | 2 +- mm/compaction.c | 90 +++++++++++++++++++++--------------------- mm/hugetlb.c | 8 ++-- mm/page_alloc.c | 52 ++++++++++++------------ mm/page_isolation.c | 12 +++--- mm/vmstat.c | 4 +- 6 files changed, 85 insertions(+), 83 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1e22d96734e0..84cfa56e2d19 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -465,7 +465,7 @@ struct zone { unsigned long flags; /* Primarily protects free_area */ - spinlock_t lock; + rwlock_t lock; /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/mm/compaction.c b/mm/compaction.c index faca45ebe62d..6ecf74d8e287 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -347,20 +347,20 @@ static inline void update_pageblock_skip(struct compact_control *cc, * Returns true if the lock is held * Returns false if the lock is not held and compaction should abort */ -static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, - struct compact_control *cc) -{ - if (cc->mode == MIGRATE_ASYNC) { - if (!spin_trylock_irqsave(lock, *flags)) { - cc->contended = true; - return false; - } - } else { - spin_lock_irqsave(lock, *flags); - } - - return true; -} +#define compact_trylock(lock, flags, cc, lockf, trylockf) \ +({ \ + bool __ret = true; \ + if ((cc)->mode == MIGRATE_ASYNC) { \ + if (!trylockf((lock), *(flags))) { \ + (cc)->contended = true; \ + __ret = false; \ + } \ + } else { \ + lockf((lock), *(flags)); \ + } \ + \ + __ret; \ +}) /* * Compaction requires the taking of some coarse locks that are potentially @@ -377,29 +377,29 @@ static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, * Returns false when compaction can continue (sync compaction might have * scheduled) */ -static bool compact_unlock_should_abort(spinlock_t *lock, - unsigned long flags, bool *locked, struct compact_control *cc) -{ - if (*locked) { - spin_unlock_irqrestore(lock, flags); - *locked = false; - } - - if (fatal_signal_pending(current)) { - cc->contended = true; - return true; - } - - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) { - cc->contended = true; - return true; - } - cond_resched(); - } - - return false; -} +#define compact_unlock_should_abort(lock, flags, locked, cc, unlockf) \ +({ \ + bool __ret = false; \ + \ + if (*(locked)) { \ + unlockf((lock), (flags)); \ + *(locked) = false; \ + } \ + \ + if (fatal_signal_pending(current)) { \ + (cc)->contended = true; \ + __ret = true; \ + } else if (need_resched()) { \ + if ((cc)->mode == MIGRATE_ASYNC) { \ + (cc)->contended = true; \ + __ret = true; \ + } else { \ + cond_resched(); \ + } \ + } \ + \ + __ret; \ +}) /* * Aside from avoiding lock contention, compaction also periodically checks @@ -457,7 +457,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, */ if (!(blockpfn % SWAP_CLUSTER_MAX) && compact_unlock_should_abort(&cc->zone->lock, flags, - &locked, cc)) + &locked, cc, write_unlock_irqrestore)) break; nr_scanned++; @@ -502,8 +502,9 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, * spin on the lock and we acquire the lock as late as * possible. */ - locked = compact_trylock_irqsave(&cc->zone->lock, - &flags, cc); + locked = compact_trylock(&cc->zone->lock, &flags, cc, + write_lock_irqsave, + write_trylock_irqsave); if (!locked) break; @@ -541,7 +542,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, } if (locked) - spin_unlock_irqrestore(&cc->zone->lock, flags); + write_unlock_irqrestore(&cc->zone->lock, flags); /* * There is a tiny chance that we have read bogus compound_order(), @@ -758,7 +759,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, */ if (!(low_pfn % SWAP_CLUSTER_MAX) && compact_unlock_should_abort(zone_lru_lock(zone), flags, - &locked, cc)) + &locked, cc, spin_unlock_irqrestore)) break; if (!pfn_valid_within(low_pfn)) @@ -847,8 +848,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* If we already hold the lock, we can skip some rechecking */ if (!locked) { - locked = compact_trylock_irqsave(zone_lru_lock(zone), - &flags, cc); + locked = compact_trylock(zone_lru_lock(zone), &flags, cc, + spin_lock_irqsave, + spin_trylock_irqsave); if (!locked) break; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3c21775f196b..18fde0139f4a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1113,7 +1113,7 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, zonelist = node_zonelist(nid, gfp_mask); for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp_mask), nodemask) { - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); pfn = ALIGN(zone->zone_start_pfn, nr_pages); while (zone_spans_last_pfn(zone, pfn, nr_pages)) { @@ -1125,16 +1125,16 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, * spinning on this lock, it may win the race * and cause alloc_contig_range() to fail... */ - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); ret = __alloc_gigantic_page(pfn, nr_pages, gfp_mask); if (!ret) return pfn_to_page(pfn); - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); } pfn += nr_pages; } - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); } return NULL; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 05e983f42316..38e39ccdd6d9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1133,7 +1133,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, } while (--count && --batch_free && !list_empty(list)); } - spin_lock(&zone->lock); + write_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); /* @@ -1151,7 +1151,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, __free_one_page(page, page_to_pfn(page), zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); } - spin_unlock(&zone->lock); + write_unlock(&zone->lock); } static void free_one_page(struct zone *zone, @@ -1159,13 +1159,13 @@ static void free_one_page(struct zone *zone, unsigned int order, int migratetype) { - spin_lock(&zone->lock); + write_lock(&zone->lock); if (unlikely(has_isolate_pageblock(zone) || is_migrate_isolate(migratetype))) { migratetype = get_pfnblock_migratetype(page, pfn); } __free_one_page(page, pfn, zone, order, migratetype); - spin_unlock(&zone->lock); + write_unlock(&zone->lock); } static void __meminit __init_single_page(struct page *page, unsigned long pfn, @@ -2251,7 +2251,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone, if (zone->nr_reserved_highatomic >= max_managed) return; - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); /* Recheck the nr_reserved_highatomic limit under the lock */ if (zone->nr_reserved_highatomic >= max_managed) @@ -2267,7 +2267,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone, } out_unlock: - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); } /* @@ -2300,7 +2300,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, pageblock_nr_pages) continue; - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); for (order = 0; order < MAX_ORDER; order++) { struct free_area *area = &(zone->free_area[order]); @@ -2343,11 +2343,11 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, ret = move_freepages_block(zone, page, ac->migratetype, NULL); if (ret) { - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); return ret; } } - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); } return false; @@ -2465,7 +2465,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, { int i, alloced = 0; - spin_lock(&zone->lock); + write_lock(&zone->lock); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) @@ -2498,7 +2498,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * pages added to the pcp list. */ __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); - spin_unlock(&zone->lock); + write_unlock(&zone->lock); return alloced; } @@ -2687,7 +2687,7 @@ void mark_free_pages(struct zone *zone) if (zone_is_empty(zone)) return; - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); max_zone_pfn = zone_end_pfn(zone); for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) @@ -2721,7 +2721,7 @@ void mark_free_pages(struct zone *zone) } } } - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); } #endif /* CONFIG_PM */ @@ -2990,7 +2990,7 @@ struct page *rmqueue(struct zone *preferred_zone, * allocate greater than order-1 page units with __GFP_NOFAIL. */ WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); do { page = NULL; @@ -3002,7 +3002,7 @@ struct page *rmqueue(struct zone *preferred_zone, if (!page) page = __rmqueue(zone, order, migratetype); } while (page && check_new_pages(page, order)); - spin_unlock(&zone->lock); + write_unlock(&zone->lock); if (!page) goto failed; __mod_zone_freepage_state(zone, -(1 << order), @@ -5009,7 +5009,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) show_node(zone); printk(KERN_CONT "%s: ", zone->name); - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); for (order = 0; order < MAX_ORDER; order++) { struct free_area *area = &zone->free_area[order]; int type; @@ -5023,7 +5023,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) types[order] |= 1 << type; } } - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); for (order = 0; order < MAX_ORDER; order++) { printk(KERN_CONT "%lu*%lukB ", nr[order], K(1UL) << order); @@ -6247,7 +6247,7 @@ static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, zone_set_nid(zone, nid); zone->name = zone_names[idx]; zone->zone_pgdat = NODE_DATA(nid); - spin_lock_init(&zone->lock); + rwlock_init(&zone->lock); zone_seqlock_init(zone); zone_pcp_init(zone); } @@ -7239,7 +7239,7 @@ static void __setup_per_zone_wmarks(void) for_each_zone(zone) { u64 tmp; - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); tmp = (u64)pages_min * zone->managed_pages; do_div(tmp, lowmem_pages); if (is_highmem(zone)) { @@ -7277,7 +7277,7 @@ static void __setup_per_zone_wmarks(void) zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp; zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2; - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); } /* update totalreserve_pages */ @@ -8041,7 +8041,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) return; offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); pfn = start_pfn; while (pfn < end_pfn) { if (!pfn_valid(pfn)) { @@ -8073,7 +8073,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) SetPageReserved((page+i)); pfn += (1 << order); } - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); } #endif @@ -8084,14 +8084,14 @@ bool is_free_buddy_page(struct page *page) unsigned long flags; unsigned int order; - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); for (order = 0; order < MAX_ORDER; order++) { struct page *page_head = page - (pfn & ((1 << order) - 1)); if (PageBuddy(page_head) && page_order(page_head) >= order) break; } - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); return order < MAX_ORDER; } @@ -8110,7 +8110,7 @@ bool set_hwpoison_free_buddy_page(struct page *page) unsigned int order; bool hwpoisoned = false; - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); for (order = 0; order < MAX_ORDER; order++) { struct page *page_head = page - (pfn & ((1 << order) - 1)); @@ -8120,7 +8120,7 @@ bool set_hwpoison_free_buddy_page(struct page *page) break; } } - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); return hwpoisoned; } diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 43e085608846..5c99fc2a1616 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -26,7 +26,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, zone = page_zone(page); - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); /* * We assume the caller intended to SET migrate type to isolate. @@ -82,7 +82,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, __mod_zone_freepage_state(zone, -nr_pages, mt); } - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); if (!ret) drain_all_pages(zone); return ret; @@ -98,7 +98,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype) struct page *buddy; zone = page_zone(page); - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); if (!is_migrate_isolate_page(page)) goto out; @@ -137,7 +137,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype) set_pageblock_migratetype(page, migratetype); zone->nr_isolate_pageblock--; out: - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); if (isolated_page) { post_alloc_hook(page, order, __GFP_MOVABLE); __free_pages(page, order); @@ -299,10 +299,10 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, return -EBUSY; /* Check all pages are free or marked as ISOLATED */ zone = page_zone(page); - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn, skip_hwpoisoned_pages); - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); trace_test_pages_isolated(start_pfn, end_pfn, pfn); diff --git a/mm/vmstat.c b/mm/vmstat.c index 8ba0870ecddd..06d79271a8ae 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1337,10 +1337,10 @@ static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat, continue; if (!nolock) - spin_lock_irqsave(&zone->lock, flags); + write_lock_irqsave(&zone->lock, flags); print(m, pgdat, zone); if (!nolock) - spin_unlock_irqrestore(&zone->lock, flags); + write_unlock_irqrestore(&zone->lock, flags); } } #endif From patchwork Tue Sep 11 05:36:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595069 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 22F0D921 for ; Tue, 11 Sep 2018 05:36:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 128382902E for ; Tue, 11 Sep 2018 05:36:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 04E76292AD; Tue, 11 Sep 2018 05:36:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6553E2902E for ; Tue, 11 Sep 2018 05:36:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35C388E0008; Tue, 11 Sep 2018 01:36:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2E3ED8E0001; Tue, 11 Sep 2018 01:36:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15EDD8E0008; Tue, 11 Sep 2018 01:36:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id BDC3A8E0001 for ; Tue, 11 Sep 2018 01:36:35 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id j15-v6so12358643pfi.10 for ; Mon, 10 Sep 2018 22:36:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=MPZK49hQvw81fghu4GWNsKHbGZXfgYk2uY7Qgh648So=; b=KVL5PSoqwNpFO4NgavkR2klvVT9Gs4NqdI5njQVWfmYsF3X5mzzRSYy7vjb0lS89Ux VXhntUb+m+tHTbScvNn6LO+qNZ9PXhZO4z/HDdXgorIQBLQbhSC62eCg50YFSBvjD7Ne 4B023VR+sraobLD/5FalNN0wDdJreXv4auYiolD0yT2XUD7FHC8ooiMSm7pzTmpUEumv HnYeOcAAXwrxzlAprpWxp2pdWMe/30QtsltyCc7JaALS5J7Kvejr3ckZMx6mZbP7uXCk wDEyASdHYWX7lxvUKuaAs6jBY+btEAIcyvLzxqjBzD4sT0hVgQ1rI6GSLk/DK8OQjij5 TY2w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51C9yRy0M+zHgrwhJSxSx3QohQN6QVH1Qqh9Fb6H6Tn9wj6kqDaB Zj/YkA9nifJ+YNpIuDtn83JfPBTSnWkaVoIxcZW36d8hRqA8NQYHNb5uMeRmhaF0ww2DJKccCqE ka1/uqwIcPZ+87TWLuahwl8OzgkgiZI+dDsBUI1rhg++jSoQIhWqXe0U4kKSVkzUv2g== X-Received: by 2002:a63:77ce:: with SMTP id s197-v6mr26709000pgc.172.1536644195428; Mon, 10 Sep 2018 22:36:35 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaLkM7hklUGnlZxX2+SwSW3xMJIJ7NV8a+CWPMrHYEIWBi6AeDUE6faNnDdIbNBPI5uWQxo X-Received: by 2002:a63:77ce:: with SMTP id s197-v6mr26708950pgc.172.1536644194663; Mon, 10 Sep 2018 22:36:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644194; cv=none; d=google.com; s=arc-20160816; b=R6eZiZhmxI/Dg36IlTqLQdlVf1AURuUtBesbclNCp2HoEqcHeUing3iJ0fDGLYdO0d nxN2/r6Rm3ne9/1OnubPuqQBo3vmF81SwWqPAqlcQOYXift6rbQ5aS9VVF80q+aPPRzP OtgnQAIUMqyUcjM4cW6E/Gy9gM0Od91utfttTTHrPnTU8SohNGyvu7eFRXAH4fynPKqk dhHb7aXi37oy8kG4d1U+uqUK9W+g1rLOaplzrPCwhoTYyBFC8c/7BNXYGCTG7SOcfGQC tQFHdEdYvWpXGKYmpHfjmCabsc+S4g2AUCeGo+SbFVqaYuB5rlGIEnrMNIOHfCHWewZf C77A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=MPZK49hQvw81fghu4GWNsKHbGZXfgYk2uY7Qgh648So=; b=QelaVM0lqm6J387hCPZ9uUkz0qbnBq9ffuMYETeZRYhDFKbGwk7Mi2w27HIEgaaOr9 mDoSs9bAPZGmaBUx3CkQ/Q7nHA1GhpcwjTmhSwD/Be99iGLIGiiCxQBSZ3eXwur9zgQI T21nbg5XOyYVyZaWHtNVjl9fGkYCmzNMkkJwly1/oCbFW+/tKKvvdcqCvHZyCWiGzdHE S3cOjZpuRgJzSLKWy1duGlTiqFl+5xXCcMNMFZUn2JMjhN5eT7aiAvZmN4nwlyTGQ6SU D8UswnWd+2FuaPzAnuPHv0Vaze3l0+WLtSlEdKfozkEvtTG4bmOB5CvwlxZjNoQ6uRdZ 42/A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:34 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426333" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:31 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 5/9] mm/page_alloc: use helper functions to add/remove a page to/from buddy Date: Tue, 11 Sep 2018 13:36:12 +0800 Message-Id: <20180911053616.6894-6-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There are multiple places that add/remove a page into/from buddy, introduce helper functions for them. This also makes it easier to add code when a page is added/removed to/from buddy. No functionality change. Acked-by: Vlastimil Babka Signed-off-by: Aaron Lu --- mm/page_alloc.c | 65 +++++++++++++++++++++++++++++-------------------- 1 file changed, 39 insertions(+), 26 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 38e39ccdd6d9..d0b954783f1d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -697,12 +697,41 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } +static inline void add_to_buddy_common(struct page *page, struct zone *zone, + unsigned int order) +{ + set_page_order(page, order); + zone->free_area[order].nr_free++; +} + +static inline void add_to_buddy_head(struct page *page, struct zone *zone, + unsigned int order, int mt) +{ + add_to_buddy_common(page, zone, order); + list_add(&page->lru, &zone->free_area[order].free_list[mt]); +} + +static inline void add_to_buddy_tail(struct page *page, struct zone *zone, + unsigned int order, int mt) +{ + add_to_buddy_common(page, zone, order); + list_add_tail(&page->lru, &zone->free_area[order].free_list[mt]); +} + static inline void rmv_page_order(struct page *page) { __ClearPageBuddy(page); set_page_private(page, 0); } +static inline void remove_from_buddy(struct page *page, struct zone *zone, + unsigned int order) +{ + list_del(&page->lru); + zone->free_area[order].nr_free--; + rmv_page_order(page); +} + /* * This function checks whether a page is free && is the buddy * we can coalesce a page and its buddy if @@ -803,13 +832,10 @@ static inline void __free_one_page(struct page *page, * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, * merge with it and move up one order. */ - if (page_is_guard(buddy)) { + if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); - } else { - list_del(&buddy->lru); - zone->free_area[order].nr_free--; - rmv_page_order(buddy); - } + else + remove_from_buddy(buddy, zone, order); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -841,8 +867,6 @@ static inline void __free_one_page(struct page *page, } done_merging: - set_page_order(page, order); - /* * If this is not the largest possible page, check if the buddy * of the next-highest order is free. If it is, it's possible @@ -859,15 +883,12 @@ static inline void __free_one_page(struct page *page, higher_buddy = higher_page + (buddy_pfn - combined_pfn); if (pfn_valid_within(buddy_pfn) && page_is_buddy(higher_page, higher_buddy, order + 1)) { - list_add_tail(&page->lru, - &zone->free_area[order].free_list[migratetype]); - goto out; + add_to_buddy_tail(page, zone, order, migratetype); + return; } } - list_add(&page->lru, &zone->free_area[order].free_list[migratetype]); -out: - zone->free_area[order].nr_free++; + add_to_buddy_head(page, zone, order, migratetype); } /* @@ -1805,9 +1826,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; - list_add(&page[size].lru, &area->free_list[migratetype]); - area->nr_free++; - set_page_order(&page[size], high); + add_to_buddy_head(&page[size], zone, high, migratetype); } } @@ -1951,9 +1970,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, struct page, lru); if (!page) continue; - list_del(&page->lru); - rmv_page_order(page); - area->nr_free--; + remove_from_buddy(page, zone, current_order); expand(zone, page, order, current_order, area, migratetype); set_pcppage_migratetype(page, migratetype); return page; @@ -2871,9 +2888,7 @@ int __isolate_free_page(struct page *page, unsigned int order) } /* Remove page from free list */ - list_del(&page->lru); - zone->free_area[order].nr_free--; - rmv_page_order(page); + remove_from_buddy(page, zone, order); /* * Set the pageblock if the isolated page is at least half of a @@ -8066,9 +8081,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn); #endif - list_del(&page->lru); - rmv_page_order(page); - zone->free_area[order].nr_free--; + remove_from_buddy(page, zone, order); for (i = 0; i < (1 << order); i++) SetPageReserved((page+i)); pfn += (1 << order); From patchwork Tue Sep 11 05:36:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595071 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D14A2921 for ; Tue, 11 Sep 2018 05:36:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C0208292A8 for ; Tue, 11 Sep 2018 05:36:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B4522292B6; Tue, 11 Sep 2018 05:36:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3BA4E292A8 for ; Tue, 11 Sep 2018 05:36:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ECC7A8E0009; Tue, 11 Sep 2018 01:36:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E54CB8E0001; Tue, 11 Sep 2018 01:36:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1FB58E0009; Tue, 11 Sep 2018 01:36:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 90CC28E0001 for ; Tue, 11 Sep 2018 01:36:38 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id w18-v6so11074340plp.3 for ; Mon, 10 Sep 2018 22:36:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=zM3eX3u5mJ8rJUOu5xmJHCZxTwzjqsaVd28sDT85h4Y=; b=YvyLYsUGSnn30OdDLrRuH9wpmG8Vjtg4VacNUQ7yP3SzkC8GzWxs2brMlI2ryJ9hR2 QdcBV99xOwNa65bx4BXEz4Ztv/jkeTAUQDUeh6snn8MIDrkKaMLnbcvquT50FYAC3bfO V21hVPGf5ANkodrmCEhiJ5ovjaJZKCBD1l+5BKKS9De64hMsjRUFL6uw6a7B/PFpIJII kDoOdVWDr+djhXH3MnpZBuX3lCc4PMpt8J2xEu0XPXZ2dAym0PZWxIoQtVhArlzzpe3M 0eYpexFPSVAfK2LoruN8NUVH/Evs+Ft9pP1RwZ/a0MMPF8CpWw8E9HCn/EE8mGO/lovg YKHQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51D4pyyLOw0IfPBUV3inv0V7/GMSDE3qyJ4eYQqGLq7ZlqhNSSFo HMVMV6PEyaGXZiaqz8NW+wvsGIWx8mYSl7LZB4HwQKf0bOAfOZS9MixlAm+w2yq+5/EN8Em9cfX E1WKYoFnkApzHcMkNO9vaso3fuZWHjkVXBybPmdvGCsaEocuo+AGuUgvjkUAn/ZVBLg== X-Received: by 2002:a17:902:8605:: with SMTP id f5-v6mr24876594plo.271.1536644198270; Mon, 10 Sep 2018 22:36:38 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZmH1szMT+eIvdf9iGnr5+Z6NHPI+NKEal+UMXakWE30+/3VYN5tj138kZ8fIe7R5R0oawB X-Received: by 2002:a17:902:8605:: with SMTP id f5-v6mr24876552plo.271.1536644197512; Mon, 10 Sep 2018 22:36:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644197; cv=none; d=google.com; s=arc-20160816; b=j0HJQ7t6iHu5CPmLlxqoSo6/QkTQX95JYlL9DgWdAeaBhGleWV460Rl6MTcbgETo1E 4I0D+bd09CiE2i23hcFkRjT2rW8/LiHZJdQ0pJ9MLDaPBMFei2Dwmo6/S42KSB+HnGXx h+crcxuVNB6gfD0jNhhAalH9j/v3hoiDNKE+9/nw0ryvJGUVJpjvqDzf8tJiJ+3Zdjnh AW/JTB9P0XivpSunCA9k3r8ZVZeGXF8W9L2YPSDT0/5LzlpgX6i2d1k3h83XQgv7LhUm 3l3tspsca0jdKaIsMpOAfDYoWvsuBCDPzqPm3HwyssSAPVz5B5xc/y2Vt3LL4+Lsdf+z elsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=zM3eX3u5mJ8rJUOu5xmJHCZxTwzjqsaVd28sDT85h4Y=; b=WVPMEgJRrvy/7Xr+RJ/CFYRxFs95QGBxRixx4r3HVWd4afW9Nz52OVkR0ChnmwoCXS FUgzcL6Dq0a4vJox6W9DCkA5pBJ23uEkNPMz/QuGtUO7bwDS4zLHJ482oJlNfN+4CF/p up9JUD4Kd9NsAr9IYqyOR7ocJeNlPS3UGCvw7WBOGAnBXcw6tJ+R41/3Rs8J8mOapk0q tL4e01hhI8MI2kCUp4dpRxSWBqoEmhJEn8WOnLdTxwiZtnw+JWBKk1izCUol1i6mVAtB Z2KWE5hyeDJPCJMBAPs4+Ngj63O4aLsPyumkXB0odezyYLuNs4ErwUF4jgGjAais235j FNYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:37 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426337" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:34 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 6/9] use atomic for free_area[order].nr_free Date: Tue, 11 Sep 2018 13:36:13 +0800 Message-Id: <20180911053616.6894-7-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Since we will make free path run concurrently, free_area[].nr_free has to be atomic. Signed-off-by: Aaron Lu --- include/linux/mmzone.h | 2 +- mm/page_alloc.c | 12 ++++++------ mm/vmstat.c | 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 84cfa56e2d19..e66b8c63d5d1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -95,7 +95,7 @@ extern int page_group_by_mobility_disabled; struct free_area { struct list_head free_list[MIGRATE_TYPES]; - unsigned long nr_free; + atomic_long_t nr_free; }; struct pglist_data; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d0b954783f1d..dff3edc60d71 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -701,7 +701,7 @@ static inline void add_to_buddy_common(struct page *page, struct zone *zone, unsigned int order) { set_page_order(page, order); - zone->free_area[order].nr_free++; + atomic_long_inc(&zone->free_area[order].nr_free); } static inline void add_to_buddy_head(struct page *page, struct zone *zone, @@ -728,7 +728,7 @@ static inline void remove_from_buddy(struct page *page, struct zone *zone, unsigned int order) { list_del(&page->lru); - zone->free_area[order].nr_free--; + atomic_long_dec(&zone->free_area[order].nr_free); rmv_page_order(page); } @@ -2225,7 +2225,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, int i; int fallback_mt; - if (area->nr_free == 0) + if (atomic_long_read(&area->nr_free) == 0) return -1; *can_steal = false; @@ -3178,7 +3178,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, struct free_area *area = &z->free_area[o]; int mt; - if (!area->nr_free) + if (atomic_long_read(&area->nr_free) == 0) continue; for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { @@ -5029,7 +5029,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) struct free_area *area = &zone->free_area[order]; int type; - nr[order] = area->nr_free; + nr[order] = atomic_long_read(&area->nr_free); total += nr[order] << order; types[order] = 0; @@ -5562,7 +5562,7 @@ static void __meminit zone_init_free_lists(struct zone *zone) unsigned int order, t; for_each_migratetype_order(order, t) { INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); - zone->free_area[order].nr_free = 0; + atomic_long_set(&zone->free_area[order].nr_free, 0); } } diff --git a/mm/vmstat.c b/mm/vmstat.c index 06d79271a8ae..c1985550bb9f 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1030,7 +1030,7 @@ static void fill_contig_page_info(struct zone *zone, unsigned long blocks; /* Count number of free blocks */ - blocks = zone->free_area[order].nr_free; + blocks = atomic_long_read(&zone->free_area[order].nr_free); info->free_blocks_total += blocks; /* Count free base pages */ @@ -1353,7 +1353,7 @@ static void frag_show_print(struct seq_file *m, pg_data_t *pgdat, seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name); for (order = 0; order < MAX_ORDER; ++order) - seq_printf(m, "%6lu ", zone->free_area[order].nr_free); + seq_printf(m, "%6lu ", atomic_long_read(&zone->free_area[order].nr_free)); seq_putc(m, '\n'); } From patchwork Tue Sep 11 05:36:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595073 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A19DC921 for ; Tue, 11 Sep 2018 05:36:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90C8B292A8 for ; Tue, 11 Sep 2018 05:36:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 84C34292B6; Tue, 11 Sep 2018 05:36:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B9B99292A8 for ; Tue, 11 Sep 2018 05:36:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 645998E000A; Tue, 11 Sep 2018 01:36:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5A2298E0001; Tue, 11 Sep 2018 01:36:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46D038E000A; Tue, 11 Sep 2018 01:36:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 027AF8E0001 for ; Tue, 11 Sep 2018 01:36:42 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id 132-v6so11725981pga.18 for ; Mon, 10 Sep 2018 22:36:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=9gzQ6BWuXDsLCdFigf39HWL/uohfzRb8dtndO4xjEiY=; b=KqNIUlVhWRHYzHA/VpmeQnimNHC7peky1iptrZ0L76/Rc0s+RBdGyxgJXX5T5nko0A YgXXuHR5/9y+2jbTRYMa1Cx9ReuQ8Etgg3zCZrYRyTmQ3+EMq/A5AmguDJHgy9IiF92o daDYrPdy4jk8vb39Uj1pSIi/RuY9sjEO65hGtBE9kH0b9dzjnwUZvTQD3MSEuUO4Shxm W8VA9VEPbiTX+j8sZShWFVAWXKXE3m63t02+HLcXkTDCWp4UcshQTnkTn0XpZrIIyEyz 50dDBY/TfuklupRJVDihMgMgN4CloT1ydmLXp1oyZ1MnfGmUfvnan8F7NBSXTzdrKPCp 3YFw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51AcS2npAA5f+TdkZz8/poQgb68szU/JTfVoigwXvmG1vioFEMXP xuTFTaUG0LqAfdvHXV2Hkc9kkz6kL8sBfF5AN6JK71rw7lSt/dQmcr/XPpZjPcjMXW+z4G4Cwoc 5k3ZWfA0D+AF+RPqjRzacFIUniF06+r4umMdvZMuyzq1LeeO4ajfNpvjWbkMcFbf2PA== X-Received: by 2002:a62:cec6:: with SMTP id y189-v6mr27214433pfg.140.1536644201656; Mon, 10 Sep 2018 22:36:41 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdb4S59sJ8vYsl2iD3Nnc4EvGL6r7xR19Fob88y6QQGCykNQhhX86qg9QpfId2RsUmGaPSo7 X-Received: by 2002:a62:cec6:: with SMTP id y189-v6mr27214360pfg.140.1536644200577; Mon, 10 Sep 2018 22:36:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644200; cv=none; d=google.com; s=arc-20160816; b=knMwYSgY8q9LIhIgVGdXoA/U8MXbReoeXhRezQwyMuTfbQqky8P31PmE/7fkCNHmbu encLUdioE7TchTK+C2BVNa1SoE3RTVN2Vba/1xOflT3TZ82YkDIsOVx6n7ci+2C9cllB L9nonCe226Atcw9k21pKDEd2eGzIbhh00vSVoimuSeBqoul6CjzeQHO9k8SZIIquRxSV 6a/SMPu4373NJE35POqQ15D9FzCI10lLNIF4/O3ExJ1mA6sFSWPCbTce8nzBw//uHOYe /neW43zvn4/ibWhtYZ+QP5fLv3mskBUluG1isKxeNm1EXYTT454UOlBYUoh+h5WrpBDT rm5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=9gzQ6BWuXDsLCdFigf39HWL/uohfzRb8dtndO4xjEiY=; b=pzjtAtEn0YbSlyitS71VSCsc+M1wj+EXSDdTOzQhv67C2PcStOBJPneyRD8zS+hW/U n9zzL6Q/WEHFKAXXSMYEe8DUPMWiC3S8L40YxcofOw3Zk/oHXOZQKSMWQS08jhz5Lx/v kwD+5TOE8tOMdEVXM147skLe5wtG2e3bKCmnPm4DI3YvP+GP1rDVzxVPu0uMPPyHbgfe u5ts0bFEia44Ku3LwXcJC6I5Sn3nbOnNmL3pDPTtndKV8KlwSOpLla32vdhLQ3/6mBEf Ei8FGqy53FIiP5iXli0GBrtCzkkgpQKb8XndnK3mFqns09P2PxzmuRk4ocrPs4sa62jZ BacQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:40 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426338" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:37 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 7/9] mm: use read_lock for free path Date: Tue, 11 Sep 2018 13:36:14 +0800 Message-Id: <20180911053616.6894-8-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Daniel Jordan's patch has made it possible for multiple threads to operate on a global list with smp_list_del() at any position and smp_list_add/splice() at head position concurrently without taking any lock. This patch makes use of this technique on free list. To make this happen, add_to_buddy_tail() is removed since only adding to list head is safe with smp_list_del() so only add_to_buddy() is used. Once free path can run concurrently, it is possible for multiple threads to free pages at the same time. If 2 pages being freed are buddy, they can miss the oppotunity to be merged. For this reason, introduce range locks to protect merge operation that makes sure inside one range, only one merge can happen and a page's Buddy status is properly set inside the lock. The range is selected as an order of (MAX_ORDER-1) pages since merge can't exceed that order. Signed-off-by: Aaron Lu --- include/linux/list.h | 1 + include/linux/mmzone.h | 3 ++ lib/list.c | 23 ++++++++++ mm/page_alloc.c | 95 +++++++++++++++++++++++------------------- 4 files changed, 78 insertions(+), 44 deletions(-) diff --git a/include/linux/list.h b/include/linux/list.h index 5f203fb55939..608e40f6489e 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -49,6 +49,7 @@ static inline bool __list_del_entry_valid(struct list_head *entry) extern void smp_list_del(struct list_head *entry); extern void smp_list_splice(struct list_head *list, struct list_head *head); +extern void smp_list_add(struct list_head *entry, struct list_head *head); /* * Insert a new entry between two known consecutive entries. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e66b8c63d5d1..0ea52e9bb610 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -467,6 +467,9 @@ struct zone { /* Primarily protects free_area */ rwlock_t lock; + /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */ + spinlock_t *range_locks; + /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/lib/list.c b/lib/list.c index 104faa144abf..3ecf62b88c86 100644 --- a/lib/list.c +++ b/lib/list.c @@ -202,3 +202,26 @@ void smp_list_splice(struct list_head *list, struct list_head *head) /* Simultaneously complete the splice and unlock the head node. */ WRITE_ONCE(head->next, first); } + +void smp_list_add(struct list_head *entry, struct list_head *head) +{ + struct list_head *succ; + + /* + * Lock the front of @head by replacing its next pointer with NULL. + * Should another thread be adding to the front, wait until it's done. + */ + succ = READ_ONCE(head->next); + while (succ == NULL || cmpxchg(&head->next, succ, NULL) != succ) { + cpu_relax(); + succ = READ_ONCE(head->next); + } + + entry->next = succ; + entry->prev = head; + succ->prev = entry; + + smp_wmb(); + + WRITE_ONCE(head->next, entry); +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dff3edc60d71..5f5cc671bcf7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -339,6 +339,17 @@ static inline bool update_defer_init(pg_data_t *pgdat, } #endif +/* Return a pointer to the spinblock for a pageblock this page belongs to */ +static inline spinlock_t *get_range_lock(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long zone_start_pfn = zone->zone_start_pfn; + unsigned long range = (page_to_pfn(page) - zone_start_pfn) >> + (MAX_ORDER - 1); + + return &zone->range_locks[range]; +} + /* Return a pointer to the bitmap storing bits affecting a block of pages */ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) @@ -697,25 +708,12 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } -static inline void add_to_buddy_common(struct page *page, struct zone *zone, - unsigned int order) +static inline void add_to_buddy(struct page *page, struct zone *zone, + unsigned int order, int mt) { set_page_order(page, order); atomic_long_inc(&zone->free_area[order].nr_free); -} - -static inline void add_to_buddy_head(struct page *page, struct zone *zone, - unsigned int order, int mt) -{ - add_to_buddy_common(page, zone, order); - list_add(&page->lru, &zone->free_area[order].free_list[mt]); -} - -static inline void add_to_buddy_tail(struct page *page, struct zone *zone, - unsigned int order, int mt) -{ - add_to_buddy_common(page, zone, order); - list_add_tail(&page->lru, &zone->free_area[order].free_list[mt]); + smp_list_add(&page->lru, &zone->free_area[order].free_list[mt]); } static inline void rmv_page_order(struct page *page) @@ -724,12 +722,25 @@ static inline void rmv_page_order(struct page *page) set_page_private(page, 0); } +static inline void remove_from_buddy_common(struct page *page, + struct zone *zone, unsigned int order) +{ + atomic_long_dec(&zone->free_area[order].nr_free); + rmv_page_order(page); +} + static inline void remove_from_buddy(struct page *page, struct zone *zone, unsigned int order) { list_del(&page->lru); - atomic_long_dec(&zone->free_area[order].nr_free); - rmv_page_order(page); + remove_from_buddy_common(page, zone, order); +} + +static inline void remove_from_buddy_concurrent(struct page *page, + struct zone *zone, unsigned int order) +{ + smp_list_del(&page->lru); + remove_from_buddy_common(page, zone, order); } /* @@ -806,6 +817,7 @@ static inline void __free_one_page(struct page *page, unsigned long uninitialized_var(buddy_pfn); struct page *buddy; unsigned int max_order; + spinlock_t *range_lock; max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1); @@ -819,6 +831,8 @@ static inline void __free_one_page(struct page *page, VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); VM_BUG_ON_PAGE(bad_range(zone, page), page); + range_lock = get_range_lock(page); + spin_lock(range_lock); continue_merging: while (order < max_order - 1) { buddy_pfn = __find_buddy_pfn(pfn, order); @@ -835,7 +849,7 @@ static inline void __free_one_page(struct page *page, if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); else - remove_from_buddy(buddy, zone, order); + remove_from_buddy_concurrent(buddy, zone, order); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -867,28 +881,8 @@ static inline void __free_one_page(struct page *page, } done_merging: - /* - * If this is not the largest possible page, check if the buddy - * of the next-highest order is free. If it is, it's possible - * that pages are being freed that will coalesce soon. In case, - * that is happening, add the free page to the tail of the list - * so it's less likely to be used soon and more likely to be merged - * as a higher order page - */ - if ((order < MAX_ORDER-2) && pfn_valid_within(buddy_pfn)) { - struct page *higher_page, *higher_buddy; - combined_pfn = buddy_pfn & pfn; - higher_page = page + (combined_pfn - pfn); - buddy_pfn = __find_buddy_pfn(combined_pfn, order + 1); - higher_buddy = higher_page + (buddy_pfn - combined_pfn); - if (pfn_valid_within(buddy_pfn) && - page_is_buddy(higher_page, higher_buddy, order + 1)) { - add_to_buddy_tail(page, zone, order, migratetype); - return; - } - } - - add_to_buddy_head(page, zone, order, migratetype); + add_to_buddy(page, zone, order, migratetype); + spin_unlock(range_lock); } /* @@ -1154,7 +1148,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, } while (--count && --batch_free && !list_empty(list)); } - write_lock(&zone->lock); + read_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); /* @@ -1172,7 +1166,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, __free_one_page(page, page_to_pfn(page), zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); } - write_unlock(&zone->lock); + read_unlock(&zone->lock); } static void free_one_page(struct zone *zone, @@ -1826,7 +1820,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; - add_to_buddy_head(&page[size], zone, high, migratetype); + add_to_buddy(&page[size], zone, high, migratetype); } } @@ -6286,6 +6280,18 @@ void __ref free_area_init_core_hotplug(int nid) } #endif +static void __init setup_range_locks(struct zone *zone) +{ + unsigned long nr = (zone->spanned_pages >> (MAX_ORDER - 1)) + 1; + unsigned long size = nr * sizeof(spinlock_t); + unsigned long i; + + zone->range_locks = memblock_virt_alloc_node_nopanic(size, + zone->zone_pgdat->node_id); + for (i = 0; i < nr; i++) + spin_lock_init(&zone->range_locks[i]); +} + /* * Set up the zone data structures: * - mark all pages reserved @@ -6357,6 +6363,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat) setup_usemap(pgdat, zone, zone_start_pfn, size); init_currently_empty_zone(zone, zone_start_pfn, size); memmap_init(size, nid, j, zone_start_pfn); + setup_range_locks(zone); } } From patchwork Tue Sep 11 05:36:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595075 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DCA6F14E0 for ; Tue, 11 Sep 2018 05:36:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB63D292AD for ; Tue, 11 Sep 2018 05:36:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BF406292BB; Tue, 11 Sep 2018 05:36:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E1513292AD for ; Tue, 11 Sep 2018 05:36:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09ED68E000B; Tue, 11 Sep 2018 01:36:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EF22C8E0001; Tue, 11 Sep 2018 01:36:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBAEF8E000C; Tue, 11 Sep 2018 01:36:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 9988B8E0001 for ; Tue, 11 Sep 2018 01:36:44 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id a23-v6so12251224pfo.23 for ; Mon, 10 Sep 2018 22:36:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=3Z27g/J4YTpBc3uoE0tTb3Wofh9wjMhkW/HMj2PkC+8=; b=DHfv2qChA/H+ZpZHxNsOI7xZsOrxk9UmuN0mO8hbaAkDZpX3MGlvTZg2BW06z54Ju5 7Oqdn4KRbuwv/7OTnWvaymidZhDdvEkwqibtZMq31XiAodr1IAyR69twcIQmUs2QsmFL t9ywrga+w5Lil6IceTp9HBRL0xzeNew01UdYkm3eX2oh6QWB9vZOhZ/TGsubl1voX40a YcJoahYTjc7mrQ+eZRiSQud40i+pU1LP3HKz3vsEwautKFt73yj7Ml2tSfll16Kjc43z h8HDsaIJ14Xlo7DuiwqArKJyvBJ36XNK6PEwjsNeGoeUNYzlp3mzaRcFeIdPx/TvUq90 S1ew== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51BdTln/BdW4rcb8iBc4jr9Z6exnGLJFU6c42hEbLKg+erzdk+cu mAEGE/6Tbf7Yk4QRqTT+pINd3q7+yIQsuLc1CRyQPbLhvPrFrUE+moQf4VIBDP62A3cDDtqVBFU i4C8FibnCrfkMOp17Yt9VDyr9cQkMyDuk92tBoUm366SwTy+PYi+nREiR9MLNXLVQwQ== X-Received: by 2002:a62:205d:: with SMTP id g90-v6mr27895457pfg.253.1536644204282; Mon, 10 Sep 2018 22:36:44 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYiHjMX+AIsn8Jnrjoey/+fDtl5963YOiTtMCoX8nmj8LH7Rb9KeaevJ68X9tjEZna8PHXo X-Received: by 2002:a62:205d:: with SMTP id g90-v6mr27895394pfg.253.1536644203184; Mon, 10 Sep 2018 22:36:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644203; cv=none; d=google.com; s=arc-20160816; b=RsdzccJb0Oa99H8mYaVtMY2tfEvyuN1TQu46KEQWamFnAi2r4IqXx5k1r2gEkawEwC bGlIPYJWPu8joAUWIijBJFgQxBLVo9UJJZthjhp8MPQ0Noey2dKxmAJMpjtUaudav97q 05u9e3fq8VIWWlsaSPvMW9Q7VkTKhyT8aTIMgZzDnyOsSvUHTpvLlC9HC+b6E3dEZq7r siBSsNYBsYVe1MOI7QWcnsmXJPR/lVO72BIg7xMalIaqtLYCtDBehF+faMa/OymjXXsS zFBsRjyO/Z8GoBYtGyQ9EidW1A6fHQejaJ2cjxgqYGFeicZrpKlNyEgr5HMZxHg/za/i UeVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=3Z27g/J4YTpBc3uoE0tTb3Wofh9wjMhkW/HMj2PkC+8=; b=sIYutjjUZJ3KM7PsphkRQ0LihBR7FlK2Kh0eHGC5P221+Qm9QJcjXNDYBwpARwy7M/ +i9tsC2aVLrNH/3tvMqeBjO24wv2gyV5u16UyB9P0sEFNg4guFZp2btkH6jq9tC3yLqK +LkWoCYeTxFU/oDHtAW8FhKKdfBOTCgJsxPFFoG52v7LRGPD0oWxKSUIp6+yNCPwNl5+ OVzJLyU/YubCMc1FkmQra2O0yIM/Q+/3bMTJsaos5w6rkncLQw9znHaNdOsSdld+Zsp+ ieK0314CkWgCxwV050x3PkAzzjjc+gnqqtksVwEjflI2/MTt8btaPgazExlx/JtQtRKy ibqw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:43 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426341" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:40 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 8/9] mm: use smp_list_splice() on free path Date: Tue, 11 Sep 2018 13:36:15 +0800 Message-Id: <20180911053616.6894-9-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With free path running concurrently, the cache bouncing on free list head is severe since multiple threads can be freeing pages and each free will need to add the page to free list head. To improve performance on free path for order-0 pages, we can choose to not add the merged pages to Buddy immediately after merge but keep them on a local percpu list first and then after all pages are finished merging, add these merged pages to Buddy with smp_list_splice() in one go. This optimization caused a problem though: the page we hold on the local percpu list can be a buddy of other being freed page and we lose the merge oppotunity for them. With this patch, we will have mergable pages unmerged in Buddy. Due to this, I don't see much value of keeping the range lock which is used to avoid such thing from happening, so the range lock is removed in this patch. Signed-off-by: Aaron Lu --- include/linux/mm.h | 1 + include/linux/mmzone.h | 3 - init/main.c | 1 + mm/page_alloc.c | 151 +++++++++++++++++++++++++---------------- 4 files changed, 95 insertions(+), 61 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a61ebe8ad4ca..a99ba2cb7a0d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2155,6 +2155,7 @@ extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, extern void setup_per_zone_wmarks(void); extern int __meminit init_per_zone_wmark_min(void); extern void mem_init(void); +extern void percpu_mergelist_init(void); extern void __init mmap_init(void); extern void show_mem(unsigned int flags, nodemask_t *nodemask); extern long si_mem_available(void); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0ea52e9bb610..e66b8c63d5d1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -467,9 +467,6 @@ struct zone { /* Primarily protects free_area */ rwlock_t lock; - /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */ - spinlock_t *range_locks; - /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/init/main.c b/init/main.c index 18f8f0140fa0..68a428e1bf15 100644 --- a/init/main.c +++ b/init/main.c @@ -517,6 +517,7 @@ static void __init mm_init(void) * bigger than MAX_ORDER unless SPARSEMEM. */ page_ext_init_flatmem(); + percpu_mergelist_init(); mem_init(); kmem_cache_init(); pgtable_init(); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5f5cc671bcf7..df38c3f2a1cc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -339,17 +339,6 @@ static inline bool update_defer_init(pg_data_t *pgdat, } #endif -/* Return a pointer to the spinblock for a pageblock this page belongs to */ -static inline spinlock_t *get_range_lock(struct page *page) -{ - struct zone *zone = page_zone(page); - unsigned long zone_start_pfn = zone->zone_start_pfn; - unsigned long range = (page_to_pfn(page) - zone_start_pfn) >> - (MAX_ORDER - 1); - - return &zone->range_locks[range]; -} - /* Return a pointer to the bitmap storing bits affecting a block of pages */ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) @@ -711,9 +700,15 @@ static inline void set_page_order(struct page *page, unsigned int order) static inline void add_to_buddy(struct page *page, struct zone *zone, unsigned int order, int mt) { + /* + * Adding page to free list before setting PageBuddy flag + * or other thread doing merge can notice its PageBuddy flag + * and attempt to merge with it, causing list corruption. + */ + smp_list_add(&page->lru, &zone->free_area[order].free_list[mt]); + smp_wmb(); set_page_order(page, order); atomic_long_inc(&zone->free_area[order].nr_free); - smp_list_add(&page->lru, &zone->free_area[order].free_list[mt]); } static inline void rmv_page_order(struct page *page) @@ -784,40 +779,17 @@ static inline int page_is_buddy(struct page *page, struct page *buddy, return 0; } -/* - * Freeing function for a buddy system allocator. - * - * The concept of a buddy system is to maintain direct-mapped table - * (containing bit values) for memory blocks of various "orders". - * The bottom level table contains the map for the smallest allocatable - * units of memory (here, pages), and each level above it describes - * pairs of units from the levels below, hence, "buddies". - * At a high level, all that happens here is marking the table entry - * at the bottom level available, and propagating the changes upward - * as necessary, plus some accounting needed to play nicely with other - * parts of the VM system. - * At each level, we keep a list of pages, which are heads of continuous - * free pages of length of (1 << order) and marked with PageBuddy. - * Page's order is recorded in page_private(page) field. - * So when we are allocating or freeing one, we can derive the state of the - * other. That is, if we allocate a small block, and both were - * free, the remainder of the region must be split into blocks. - * If a block is freed, and its buddy is also free, then this - * triggers coalescing into a block of larger size. - * - * -- nyc - */ - -static inline void __free_one_page(struct page *page, +/* Return merged page pointer with order updated */ +static inline struct page *do_merge(struct page *page, unsigned long pfn, - struct zone *zone, unsigned int order, + struct zone *zone, unsigned int *p_order, int migratetype) { unsigned long combined_pfn; unsigned long uninitialized_var(buddy_pfn); struct page *buddy; unsigned int max_order; - spinlock_t *range_lock; + unsigned int order = *p_order; max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1); @@ -831,8 +803,6 @@ static inline void __free_one_page(struct page *page, VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); VM_BUG_ON_PAGE(bad_range(zone, page), page); - range_lock = get_range_lock(page); - spin_lock(range_lock); continue_merging: while (order < max_order - 1) { buddy_pfn = __find_buddy_pfn(pfn, order); @@ -881,8 +851,41 @@ static inline void __free_one_page(struct page *page, } done_merging: + *p_order = order; + return page; +} + +/* + * Freeing function for a buddy system allocator. + * + * The concept of a buddy system is to maintain direct-mapped table + * (containing bit values) for memory blocks of various "orders". + * The bottom level table contains the map for the smallest allocatable + * units of memory (here, pages), and each level above it describes + * pairs of units from the levels below, hence, "buddies". + * At a high level, all that happens here is marking the table entry + * at the bottom level available, and propagating the changes upward + * as necessary, plus some accounting needed to play nicely with other + * parts of the VM system. + * At each level, we keep a list of pages, which are heads of continuous + * free pages of length of (1 << order) and marked with PageBuddy. + * Page's order is recorded in page_private(page) field. + * So when we are allocating or freeing one, we can derive the state of the + * other. That is, if we allocate a small block, and both were + * free, the remainder of the region must be split into blocks. + * If a block is freed, and its buddy is also free, then this + * triggers coalescing into a block of larger size. + * + * -- nyc + */ + +static inline void __free_one_page(struct page *page, + unsigned long pfn, + struct zone *zone, unsigned int order, + int migratetype) +{ + page = do_merge(page, pfn, zone, &order, migratetype); add_to_buddy(page, zone, order, migratetype); - spin_unlock(range_lock); } /* @@ -1081,6 +1084,20 @@ static inline void prefetch_buddy(struct page *page) prefetch(buddy); } +static DEFINE_PER_CPU(struct list_head, merge_lists[MAX_ORDER][MIGRATE_TYPES]); + +void __init percpu_mergelist_init(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + unsigned int order, mt; + + for_each_migratetype_order(order, mt) + INIT_LIST_HEAD(per_cpu_ptr(&merge_lists[order][mt], cpu)); + } +} + /* * Frees a number of pages from the PCP lists * Assumes all pages on list are in same zone, and of same order. @@ -1101,10 +1118,10 @@ static void free_pcppages_bulk(struct zone *zone, int count, bool isolated_pageblocks; struct page *page, *tmp; LIST_HEAD(head); + struct list_head *list; + unsigned int order; while (count) { - struct list_head *list; - /* * Remove pages from lists in a round-robin fashion. A * batch_free count is maintained that is incremented when an @@ -1157,15 +1174,46 @@ static void free_pcppages_bulk(struct zone *zone, int count, */ list_for_each_entry_safe(page, tmp, &head, lru) { int mt = get_pcppage_migratetype(page); + struct page *merged_page; + /* MIGRATE_ISOLATE page should not go to pcplists */ VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); /* Pageblock could have been isolated meanwhile */ if (unlikely(isolated_pageblocks)) mt = get_pageblock_migratetype(page); - __free_one_page(page, page_to_pfn(page), zone, 0, mt); + order = 0; + merged_page = do_merge(page, page_to_pfn(page), zone, &order, mt); + list_add(&merged_page->lru, this_cpu_ptr(&merge_lists[order][mt])); trace_mm_page_pcpu_drain(page, 0, mt); } + + for_each_migratetype_order(order, migratetype) { + unsigned long n; + struct list_head *entry; + + list = this_cpu_ptr(&merge_lists[order][migratetype]); + if (list_empty(list)) + continue; + + smp_list_splice(list, &zone->free_area[order].free_list[migratetype]); + + /* Add to list first before setting PageBuddy flag */ + smp_wmb(); + + n = 0; + entry = list; + do { + entry = entry->next; + page = list_entry(entry, struct page, lru); + set_page_order(page, order); + n++; + } while (entry != list->prev); + INIT_LIST_HEAD(list); + + atomic_long_add(n, &zone->free_area[order].nr_free); + } + read_unlock(&zone->lock); } @@ -6280,18 +6328,6 @@ void __ref free_area_init_core_hotplug(int nid) } #endif -static void __init setup_range_locks(struct zone *zone) -{ - unsigned long nr = (zone->spanned_pages >> (MAX_ORDER - 1)) + 1; - unsigned long size = nr * sizeof(spinlock_t); - unsigned long i; - - zone->range_locks = memblock_virt_alloc_node_nopanic(size, - zone->zone_pgdat->node_id); - for (i = 0; i < nr; i++) - spin_lock_init(&zone->range_locks[i]); -} - /* * Set up the zone data structures: * - mark all pages reserved @@ -6363,7 +6399,6 @@ static void __init free_area_init_core(struct pglist_data *pgdat) setup_usemap(pgdat, zone, zone_start_pfn, size); init_currently_empty_zone(zone, zone_start_pfn, size); memmap_init(size, nid, j, zone_start_pfn); - setup_range_locks(zone); } } From patchwork Tue Sep 11 05:36:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595077 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3E2514E0 for ; Tue, 11 Sep 2018 05:36:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B3359292A8 for ; Tue, 11 Sep 2018 05:36:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A782D292B6; Tue, 11 Sep 2018 05:36:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3187F292AD for ; Tue, 11 Sep 2018 05:36:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A6598E000C; Tue, 11 Sep 2018 01:36:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 52EC18E0001; Tue, 11 Sep 2018 01:36:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 421428E000C; Tue, 11 Sep 2018 01:36:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id E452A8E0001 for ; Tue, 11 Sep 2018 01:36:46 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id x85-v6so12373407pfe.13 for ; Mon, 10 Sep 2018 22:36:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=0EDaHouM/o47pSa2XbLulzepPbv3Y/nvnAumJVbrvwI=; b=H+A1EvdbfVgoiLEzbysllucGZB9+6lmJUEHjtPaEPE3RsGIhSWgk5/v1Av1e8RIxw/ 8usN3bi9QKrBrwYRF15Iq/IAzrCwWTAfQgGWudRgzxNlwsYJ/9JVJjzqYJya7jVpoFJ/ VtCDtQeD6WPNYBPTFdpAKhbHJtH+LKidtHpiMbIisKqdxjd+veh5M+Db7ppgvfD9hX4/ nU4RNmIPrhZaw4yegNIGfMP+EaVsxMRCVXkpplAxFYUhqfn8LXBCmVfUB55RAY9JELjL nn7ZHbhEQmU7cIUKFsUrBNppw0Jpe45u4aJQqiZDBfI0UMJc4g9y78xbMPuPVWTnxN0u fTQA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DNjxtSljfWau+o2BhX4+pd/6A2eCPnOwbBjW3b4MoqNYXpt0+l BL7I6JLtku9Pk0dw3m5NzvBzWOgNXDZokMNjvpFsD2ivTev8Lv5np+VZzm3e6Efe3M3+O1mb9ZR jnhaqvUrSleB043GSmmhESo89VXvOPHazdzK404AhRbOJVYtv1vbdWWSghcD4b/zdgg== X-Received: by 2002:a62:b0e:: with SMTP id t14-v6mr26533489pfi.36.1536644206625; Mon, 10 Sep 2018 22:36:46 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaE9py5RfWzTeO/mCOhsH4z7w4dNIdK7WOop7m6RYrHqXc6bZbvsVD+NR8sB2GSHm2Q5Zlj X-Received: by 2002:a62:b0e:: with SMTP id t14-v6mr26533424pfi.36.1536644205833; Mon, 10 Sep 2018 22:36:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644205; cv=none; d=google.com; s=arc-20160816; b=MQwlQx5IJfH5gWhasbhLSNiESYIsOwzu9Bh/m8jNt3a/laF3eESliC2QPUKcDylaZx jkYqlIfbp+e86g7o1wCLBgJp0jF/pHhrQTT5ABK+R7RDTsbEOXRQywqx4UUEXeD5VXn5 FOppaNUwKfUWlGtYPduwDMHtma43iwcLuQE5ZoO/hbPv1LMJfadsMSReqIosBhX3NVhX 7xXEfZKRTbWH2ORGP9nrRQKrrlUMgtRzNyAwueoAfzP7PM394JjCAmaT+Q7wknkTXCh1 +9XUfWvmU3DQLYgrQeLvjroWyEqTCQu/nYLIkQUCzgJgGrImjRFAhG+0jwlhPwhtCfpF O7cA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=0EDaHouM/o47pSa2XbLulzepPbv3Y/nvnAumJVbrvwI=; b=EgT4kXsJhIyAlX+Wv5PCdCJ5TnqEXDFtJTddITy83iYXssCB8oGzqfcGQxvagmiMlY fPMUAfRANSFsLPnhc6NsDmJSQGFxonYhy/kpqXCOmdpnlxWRZ+R4H9XHTw0Yvx9CoP1V jbud8efB8rSSd7p0ppSyyRnCj2wTKMfa4GnbTmrdNqyE8OLOyNjO3beMCKdU+IlpmXmx sCR/UT8Uv37qQQBuwnqL82cVskrUdZwqgcMWeIfEAthktVvslM4NvnRLZY3E4b9JB8hY lncPM0rmskyznHR4rXzMfEB78A+3euF5rKF1QNDrpqRSJSWKeG+czDgHxOMyMXntsO5r OEVw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:45 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426344" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:43 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 9/9] mm: page_alloc: merge before sending pages to global pool Date: Tue, 11 Sep 2018 13:36:16 +0800 Message-Id: <20180911053616.6894-10-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Now that we have mergable pages in Buddy unmerged, this is a step to reduce such things from happening to some extent. Suppose two buddy pages are on the list to be freed in free_pcppages_bulk(), the first page goes to merge but its buddy is not in Buddy yet so we hold it locally as an order0 page; then its buddy page goes to merge and couldn't merge either because we hold the first page locally instead of having it in Buddy. The end result is, we have two mergable buddy pages but failed to merge it. So this patch will attempt merge for these to-be-freed pages before acquiring any lock, it could, to some extent, reduce fragmentation caused by last patch. With this change, the pcp_drain trace isn't easy to use so I removed it. Signed-off-by: Aaron Lu --- mm/page_alloc.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 73 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index df38c3f2a1cc..d3eafe857713 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1098,6 +1098,72 @@ void __init percpu_mergelist_init(void) } } +static inline bool buddy_in_list(struct page *page, struct page *buddy, + struct list_head *list) +{ + list_for_each_entry_continue(page, list, lru) + if (page == buddy) + return true; + + return false; +} + +static inline void merge_in_pcp(struct list_head *list) +{ + int order; + struct page *page; + + /* Set order information to 0 initially since they are PCP pages */ + list_for_each_entry(page, list, lru) + set_page_private(page, 0); + + /* + * Check for mergable pages for each order. + * + * For each order, check if their buddy is also in the list and + * if so, do merge, then remove the merged buddy from the list. + */ + for (order = 0; order < MAX_ORDER - 1; order++) { + bool has_merge = false; + + page = list_first_entry(list, struct page, lru); + while (&page->lru != list) { + unsigned long pfn, buddy_pfn, combined_pfn; + struct page *buddy, *n; + + if (page_order(page) != order) { + page = list_next_entry(page, lru); + continue; + } + + pfn = page_to_pfn(page); + buddy_pfn = __find_buddy_pfn(pfn, order); + buddy = page + (buddy_pfn - pfn); + if (!buddy_in_list(page, buddy, list) || + page_order(buddy) != order) { + page = list_next_entry(page, lru); + continue; + } + + combined_pfn = pfn & buddy_pfn; + if (combined_pfn == pfn) { + set_page_private(page, order + 1); + list_del(&buddy->lru); + page = list_next_entry(page, lru); + } else { + set_page_private(buddy, order + 1); + n = list_next_entry(page, lru); + list_del(&page->lru); + page = n; + } + has_merge = true; + } + + if (!has_merge) + break; + } +} + /* * Frees a number of pages from the PCP lists * Assumes all pages on list are in same zone, and of same order. @@ -1165,6 +1231,12 @@ static void free_pcppages_bulk(struct zone *zone, int count, } while (--count && --batch_free && !list_empty(list)); } + /* + * Before acquiring the possibly heavily contended zone lock, do merge + * among these to-be-freed PCP pages before sending them to Buddy. + */ + merge_in_pcp(&head); + read_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); @@ -1182,10 +1254,9 @@ static void free_pcppages_bulk(struct zone *zone, int count, if (unlikely(isolated_pageblocks)) mt = get_pageblock_migratetype(page); - order = 0; + order = page_order(page); merged_page = do_merge(page, page_to_pfn(page), zone, &order, mt); list_add(&merged_page->lru, this_cpu_ptr(&merge_lists[order][mt])); - trace_mm_page_pcpu_drain(page, 0, mt); } for_each_migratetype_order(order, migratetype) {