From patchwork Fri Oct 8 16:19:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Saenz Julienne X-Patchwork-Id: 12545689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B323DC43219 for ; Fri, 8 Oct 2021 16:19:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 620396101E for ; Fri, 8 Oct 2021 16:19:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 620396101E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C1CAD6B0073; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BA6146B0074; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6E3B6B0075; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by kanga.kvack.org (Postfix) with ESMTP id 8C8A46B0073 for ; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3D1462CFF7 for ; Fri, 8 Oct 2021 16:19:39 +0000 (UTC) X-FDA: 78673780878.03.D801DF9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf03.hostedemail.com (Postfix) with ESMTP id E673030007B2 for ; Fri, 8 Oct 2021 16:19:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633709978; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sKT1idpTNkVhaBOIkuTT61/rDCOAntRvXZQwZuQU3No=; b=c25hZyvPuIxLI3rVprL+xVUCUmn8tKTL4hscjt3k5fc3Y/cKPry38v75OjURhgDDq8Qqcc iZmoNiGfcla0H6wkZb+qR8vtUWI7Ds/tYjJGhSQ19BHkl4oUbXkaEcz3QduHT55WnXp3hP WarZhvHM+MigrRsA+CU9buGpBJr+shM= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-604-6_8SoMImMDidC4cOiJBJbg-1; Fri, 08 Oct 2021 12:19:35 -0400 X-MC-Unique: 6_8SoMImMDidC4cOiJBJbg-1 Received: by mail-wr1-f69.google.com with SMTP id e12-20020a056000178c00b001606927de88so7701843wrg.10 for ; Fri, 08 Oct 2021 09:19:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sKT1idpTNkVhaBOIkuTT61/rDCOAntRvXZQwZuQU3No=; b=LSMoxHSxjSPUCF6a4ozG1Y49rSS6oQAY/mMbCPYp9Fn8ZSeD2+Qqi07e2BF09Wj/MI AROdeKB/kxBq+eiUPRh7Z07SIEWoVXVn2PQJzzTgjNiuZQ7aD7pvsJpODmwB7yJ25k+2 YJlbZuC77A3pmkS24QHsaqtAR2aAri02sWLg1ZKBTkBrTm4PKR+yfUBJijo3S/LwNFUn GtvH8jQuR8xHr+XL8Yv4FK0W3QbtU02DVCrjARnE6b7pdO9+X7t7sQykQklhCk1Q5G4O a1FamILF8OMFSiw6vVlC8yDaRoxTZRPEC51sZo8qaNsIFti01RQjoZmcb2tW+egZe2yJ egrQ== X-Gm-Message-State: AOAM53081NwaNJqRQxgzc5hv3CzChiOrrKkZ64VLpgkUy2TxwTZQvYl+ DJhgasRalRntPDddPpbUu4OYQTPD4AUKpWRhYIOkhi+xZqhQ9daFhSv4n536gDD6OKZIG3BKSfE OD21w8RTcmME= X-Received: by 2002:a7b:c0da:: with SMTP id s26mr4467324wmh.58.1633709974149; Fri, 08 Oct 2021 09:19:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyajDwkhy5l3/UOB/lGC0lTJWJcdm20ijXtf9vCwLnu+uMQULK/wCDkF1fE82BbhRGEJC15RA== X-Received: by 2002:a7b:c0da:: with SMTP id s26mr4467307wmh.58.1633709973984; Fri, 08 Oct 2021 09:19:33 -0700 (PDT) Received: from vian.redhat.com ([2a0c:5a80:1d03:b900:c3d1:5974:ce92:3123]) by smtp.gmail.com with ESMTPSA id f184sm2901753wmf.22.2021.10.08.09.19.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 09:19:33 -0700 (PDT) From: Nicolas Saenz Julienne To: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, peterz@infradead.org, mtosatti@redhat.com, nilal@redhat.com, mgorman@suse.de, linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com, paulmck@kernel.org, ppandit@redhat.com, Nicolas Saenz Julienne Subject: [RFC 1/3] mm/page_alloc: Simplify __rmqueue_pcplist()'s arguments Date: Fri, 8 Oct 2021 18:19:20 +0200 Message-Id: <20211008161922.942459-2-nsaenzju@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211008161922.942459-1-nsaenzju@redhat.com> References: <20211008161922.942459-1-nsaenzju@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Queue-Id: E673030007B2 X-Stat-Signature: ytsrmcz4uc47u6rha4bgfnuzpsnuzcsq Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=c25hZyvP; spf=none (imf03.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=nsaenzju@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-HE-Tag: 1633709978-543836 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Both users of __rmqueue_pcplist() use the same means to extract the right list from their per-cpu lists: calculate the index based on the page's migratetype and order. This data is already being passed to __rmqueue_pcplist(), so centralize the list extraction process inside the function. Signed-off-by: Nicolas Saenz Julienne --- mm/page_alloc.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b37435c274cf..dd89933503b4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3600,11 +3600,13 @@ static inline struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, int migratetype, unsigned int alloc_flags, - struct per_cpu_pages *pcp, - struct list_head *list) + struct per_cpu_pages *pcp) { + struct list_head *list; struct page *page; + list = &pcp->lists[order_to_pindex(migratetype, order)]; + do { if (list_empty(list)) { int batch = READ_ONCE(pcp->batch); @@ -3643,7 +3645,6 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, unsigned int alloc_flags) { struct per_cpu_pages *pcp; - struct list_head *list; struct page *page; unsigned long flags; @@ -3656,8 +3657,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, */ pcp = this_cpu_ptr(zone->per_cpu_pageset); pcp->free_factor >>= 1; - list = &pcp->lists[order_to_pindex(migratetype, order)]; - page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); + page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp); local_unlock_irqrestore(&pagesets.lock, flags); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); @@ -5202,7 +5202,6 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, struct zone *zone; struct zoneref *z; struct per_cpu_pages *pcp; - struct list_head *pcp_list; struct alloc_context ac; gfp_t alloc_gfp; unsigned int alloc_flags = ALLOC_WMARK_LOW; @@ -5278,7 +5277,6 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, /* Attempt the batch allocation */ local_lock_irqsave(&pagesets.lock, flags); pcp = this_cpu_ptr(zone->per_cpu_pageset); - pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)]; while (nr_populated < nr_pages) { @@ -5288,8 +5286,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, continue; } - page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags, - pcp, pcp_list); + page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags, pcp); if (unlikely(!page)) { /* Try and get at least one page */ if (!nr_populated) From patchwork Fri Oct 8 16:19:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Saenz Julienne X-Patchwork-Id: 12545687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6240C4332F for ; Fri, 8 Oct 2021 16:19:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 417C861038 for ; Fri, 8 Oct 2021 16:19:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 417C861038 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 569FD6B0072; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 515226B0073; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DC646B0074; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0128.hostedemail.com [216.40.44.128]) by kanga.kvack.org (Postfix) with ESMTP id 2F09C6B0072 for ; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id CE0AF184075AD for ; Fri, 8 Oct 2021 16:19:38 +0000 (UTC) X-FDA: 78673780836.16.FFC5501 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 59586D0020D3 for ; Fri, 8 Oct 2021 16:19:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633709977; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2fuLR+YRtRHAlItHHlAWKLPVpJwI/vzr35BiENgc3FU=; b=Spc1oS85z506bdKXS/4lPCD06xiaOAWDLTMgkxmkm57Cft5J1UADkVaLB4/wOoiGZVUmSk y9W56aZRGY2CMQAKkfzFQuwvMc5geJrTCsoFnXIAC4RxyjzvZHoNClr1i7SIMPTCH7utfa XwfwIZqO/eOdkDg93Hu+kaARPgL60J8= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-96-XcBDPIayPse_bdIkBKkNng-1; Fri, 08 Oct 2021 12:19:36 -0400 X-MC-Unique: XcBDPIayPse_bdIkBKkNng-1 Received: by mail-wr1-f70.google.com with SMTP id c4-20020a5d6cc4000000b00160edc8bb28so1919981wrc.9 for ; Fri, 08 Oct 2021 09:19:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2fuLR+YRtRHAlItHHlAWKLPVpJwI/vzr35BiENgc3FU=; b=v3V40VfMv8CIaZdamr/X1QjGoif7VVHuGLQQywrTzM8nH8FGraavGO9Gu3Krk1tqws SwmT4hxahKhMBXaDGnVnk0Ewcmnp4Rxo+iHj9yRH/c9KEZB7eDUbU9MHr4YAFwdLOLsQ ptnkKHJ4Qhq9gSODsV3sqSbcM0nLHbFMERXrcxbryedy44bRvbmoGpydBAF7MDMzAbPe mdSEoTYH0lsyiZEeedvgzqkH0aMTbMWMgwxxx0vWpvfAfuqVzkQH7cG69PjCJilMxA7p mwm6RpTNErNuXgr4IK5Fz0W9BaL2VVNiLRUCAY5KhCYB/txxpiutxrzfjmjflQIZW4p9 1XAw== X-Gm-Message-State: AOAM533nWmnDNeDXwoLw5IPcsmdQPUE1tcpz68iU2/rsX+MCfiiScPah IxQdRRgv2eZlgQdP7S6C9YAPrVAo3aI0cq1D2gA+LUTuHc9y7Y9tG+1ORYKnhjjcgDQBX0MNQha 1zCZaanM6w8c= X-Received: by 2002:a05:600c:154f:: with SMTP id f15mr4502435wmg.92.1633709975483; Fri, 08 Oct 2021 09:19:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOQOshorop78BPPPmvY4365BmT9Zj6yiWDgdBD9GJTejFjNxj+YIOr2lmIyGa7Qfdh1ZDiUg== X-Received: by 2002:a05:600c:154f:: with SMTP id f15mr4502412wmg.92.1633709975241; Fri, 08 Oct 2021 09:19:35 -0700 (PDT) Received: from vian.redhat.com ([2a0c:5a80:1d03:b900:c3d1:5974:ce92:3123]) by smtp.gmail.com with ESMTPSA id f184sm2901753wmf.22.2021.10.08.09.19.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 09:19:34 -0700 (PDT) From: Nicolas Saenz Julienne To: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, peterz@infradead.org, mtosatti@redhat.com, nilal@redhat.com, mgorman@suse.de, linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com, paulmck@kernel.org, ppandit@redhat.com, Nicolas Saenz Julienne Subject: [RFC 2/3] mm/page_alloc: Access lists in 'struct per_cpu_pages' indirectly Date: Fri, 8 Oct 2021 18:19:21 +0200 Message-Id: <20211008161922.942459-3-nsaenzju@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211008161922.942459-1-nsaenzju@redhat.com> References: <20211008161922.942459-1-nsaenzju@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 59586D0020D3 X-Stat-Signature: zn9aw8bob11fmypoax35kqwb674hq7ur Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Spc1oS85; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf15.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=nsaenzju@redhat.com X-HE-Tag: 1633709978-247655 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation to adding remote pcplists drain support, let's bundle 'struct per_cpu_pages' list heads and page count into a new structure, 'struct pcplists', and have all code access it indirectly through a pointer. It'll be used by upcoming patches, which will maintain multiple versions of pcplists and switch the pointer atomically. free_pcppages_bulk() also gains a new argument, since we want to avoid dereferencing the pcplists pointer twice per critical section (delimited by the pagevec local locks). 'struct pcplists' data is marked as __private, so as to make sure nobody accesses it directly, except for the initialization code. Note that 'struct per_cpu_pages' is used during boot, when no allocation is possible. Signed-off-by: Nicolas Saenz Julienne --- include/linux/mmzone.h | 10 +++++-- mm/page_alloc.c | 66 +++++++++++++++++++++++++----------------- mm/vmstat.c | 6 ++-- 3 files changed, 49 insertions(+), 33 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6a1d79d84675..fb023da9a181 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -358,7 +358,6 @@ enum zone_watermarks { /* Fields and list protected by pagesets local_lock in page_alloc.c */ struct per_cpu_pages { - int count; /* number of pages in the list */ int high; /* high watermark, emptying needed */ int batch; /* chunk size for buddy add/remove */ short free_factor; /* batch scaling factor during free */ @@ -366,8 +365,13 @@ struct per_cpu_pages { short expire; /* When 0, remote pagesets are drained */ #endif - /* Lists of pages, one per migrate type stored on the pcp-lists */ - struct list_head lists[NR_PCP_LISTS]; + struct pcplists *lp; + struct pcplists { + /* Number of pages in the lists */ + int count; + /* Lists of pages, one per migrate type stored on the pcp-lists */ + struct list_head lists[NR_PCP_LISTS]; + } __private pcplists; }; struct per_cpu_zonestat { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dd89933503b4..842816f269da 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1438,7 +1438,8 @@ static inline void prefetch_buddy(struct page *page) * pinned" detection logic. */ static void free_pcppages_bulk(struct zone *zone, int count, - struct per_cpu_pages *pcp) + struct per_cpu_pages *pcp, + struct pcplists *lp) { int pindex = 0; int batch_free = 0; @@ -1453,7 +1454,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, * Ensure proper count is passed which otherwise would stuck in the * below while (list_empty(list)) loop. */ - count = min(pcp->count, count); + count = min(lp->count, count); while (count > 0) { struct list_head *list; @@ -1468,7 +1469,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, batch_free++; if (++pindex == NR_PCP_LISTS) pindex = 0; - list = &pcp->lists[pindex]; + list = &lp->lists[pindex]; } while (list_empty(list)); /* This is the only non-empty list. Free them all. */ @@ -1508,7 +1509,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, } } while (count > 0 && --batch_free && !list_empty(list)); } - pcp->count -= nr_freed; + lp->count -= nr_freed; /* * local_lock_irq held so equivalent to spin_lock_irqsave for @@ -3069,14 +3070,16 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, */ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) { + struct pcplists *lp; unsigned long flags; int to_drain, batch; local_lock_irqsave(&pagesets.lock, flags); batch = READ_ONCE(pcp->batch); - to_drain = min(pcp->count, batch); + lp = pcp->lp; + to_drain = min(lp->count, batch); if (to_drain > 0) - free_pcppages_bulk(zone, to_drain, pcp); + free_pcppages_bulk(zone, to_drain, pcp, lp); local_unlock_irqrestore(&pagesets.lock, flags); } #endif @@ -3092,12 +3095,14 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) { unsigned long flags; struct per_cpu_pages *pcp; + struct pcplists *lp; local_lock_irqsave(&pagesets.lock, flags); pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - if (pcp->count) - free_pcppages_bulk(zone, pcp->count, pcp); + lp = pcp->lp; + if (lp->count) + free_pcppages_bulk(zone, lp->count, pcp, lp); local_unlock_irqrestore(&pagesets.lock, flags); } @@ -3158,7 +3163,7 @@ static void drain_local_pages_wq(struct work_struct *work) * * drain_all_pages() is optimized to only execute on cpus where pcplists are * not empty. The check for non-emptiness can however race with a free to - * pcplist that has not yet increased the pcp->count from 0 to 1. Callers + * pcplist that has not yet increased the lp->count from 0 to 1. Callers * that need the guarantee that every CPU has drained can disable the * optimizing racy check. */ @@ -3200,21 +3205,22 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) struct per_cpu_pages *pcp; struct zone *z; bool has_pcps = false; + struct pcplists *lp; if (force_all_cpus) { /* - * The pcp.count check is racy, some callers need a + * The lp->count check is racy, some callers need a * guarantee that no cpu is missed. */ has_pcps = true; } else if (zone) { - pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - if (pcp->count) + lp = per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp; + if (lp->count) has_pcps = true; } else { for_each_populated_zone(z) { - pcp = per_cpu_ptr(z->per_cpu_pageset, cpu); - if (pcp->count) { + lp = per_cpu_ptr(z->per_cpu_pageset, cpu)->lp; + if (lp->count) { has_pcps = true; break; } @@ -3366,19 +3372,21 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn, { struct zone *zone = page_zone(page); struct per_cpu_pages *pcp; + struct pcplists *lp; int high; int pindex; __count_vm_event(PGFREE); pcp = this_cpu_ptr(zone->per_cpu_pageset); + lp = pcp->lp; pindex = order_to_pindex(migratetype, order); - list_add(&page->lru, &pcp->lists[pindex]); - pcp->count += 1 << order; + list_add(&page->lru, &lp->lists[pindex]); + lp->count += 1 << order; high = nr_pcp_high(pcp, zone); - if (pcp->count >= high) { + if (lp->count >= high) { int batch = READ_ONCE(pcp->batch); - free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp); + free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, lp); } } @@ -3603,9 +3611,11 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, struct per_cpu_pages *pcp) { struct list_head *list; + struct pcplists *lp; struct page *page; - list = &pcp->lists[order_to_pindex(migratetype, order)]; + lp = pcp->lp; + list = &lp->lists[order_to_pindex(migratetype, order)]; do { if (list_empty(list)) { @@ -3625,14 +3635,14 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, batch, list, migratetype, alloc_flags); - pcp->count += alloced << order; + lp->count += alloced << order; if (unlikely(list_empty(list))) return NULL; } page = list_first_entry(list, struct page, lru); list_del(&page->lru); - pcp->count -= 1 << order; + lp->count -= 1 << order; } while (check_new_pcp(page)); return page; @@ -5877,7 +5887,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) continue; for_each_online_cpu(cpu) - free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; + free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp->count; } printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n" @@ -5971,7 +5981,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) free_pcp = 0; for_each_online_cpu(cpu) - free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; + free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp->count; show_node(zone); printk(KERN_CONT @@ -6012,7 +6022,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) K(zone_page_state(zone, NR_MLOCK)), K(zone_page_state(zone, NR_BOUNCE)), K(free_pcp), - K(this_cpu_read(zone->per_cpu_pageset->count)), + K(this_cpu_read(zone->per_cpu_pageset)->lp->count), K(zone_page_state(zone, NR_FREE_CMA_PAGES))); printk("lowmem_reserve[]:"); for (i = 0; i < MAX_NR_ZONES; i++) @@ -6848,7 +6858,7 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online) /* * pcp->high and pcp->batch values are related and generally batch is lower - * than high. They are also related to pcp->count such that count is lower + * than high. They are also related to pcp->lp->count such that count is lower * than high, and as soon as it reaches high, the pcplist is flushed. * * However, guaranteeing these relations at all times would require e.g. write @@ -6856,7 +6866,7 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online) * thus be prone to error and bad for performance. Thus the update only prevents * store tearing. Any new users of pcp->batch and pcp->high should ensure they * can cope with those fields changing asynchronously, and fully trust only the - * pcp->count field on the local CPU with interrupts disabled. + * pcp->lp->count field on the local CPU with interrupts disabled. * * mutex_is_locked(&pcp_batch_high_lock) required when calling this function * outside of boot time (or some other assurance that no concurrent updaters @@ -6876,8 +6886,10 @@ static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonesta memset(pcp, 0, sizeof(*pcp)); memset(pzstats, 0, sizeof(*pzstats)); + pcp->lp = &ACCESS_PRIVATE(pcp, pcplists); + for (pindex = 0; pindex < NR_PCP_LISTS; pindex++) - INIT_LIST_HEAD(&pcp->lists[pindex]); + INIT_LIST_HEAD(&pcp->lp->lists[pindex]); /* * Set batch and high values safe for a boot pageset. A true percpu diff --git a/mm/vmstat.c b/mm/vmstat.c index 8ce2620344b2..5279d3f34e0b 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -856,7 +856,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) * if not then there is nothing to expire. */ if (!__this_cpu_read(pcp->expire) || - !__this_cpu_read(pcp->count)) + !this_cpu_ptr(pcp)->lp->count) continue; /* @@ -870,7 +870,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) if (__this_cpu_dec_return(pcp->expire)) continue; - if (__this_cpu_read(pcp->count)) { + if (this_cpu_ptr(pcp)->lp->count) { drain_zone_pages(zone, this_cpu_ptr(pcp)); changes++; } @@ -1707,7 +1707,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, "\n high: %i" "\n batch: %i", i, - pcp->count, + pcp->lp->count, pcp->high, pcp->batch); #ifdef CONFIG_SMP From patchwork Fri Oct 8 16:19:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Saenz Julienne X-Patchwork-Id: 12545691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 670F2C433F5 for ; Fri, 8 Oct 2021 16:19:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0B60C61019 for ; Fri, 8 Oct 2021 16:19:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0B60C61019 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id AE0D36B0074; Fri, 8 Oct 2021 12:19:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9CEAE900002; Fri, 8 Oct 2021 12:19:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84A0C6B0078; Fri, 8 Oct 2021 12:19:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 732E66B0074 for ; Fri, 8 Oct 2021 12:19:56 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 21AE92CBB9 for ; Fri, 8 Oct 2021 16:19:56 +0000 (UTC) X-FDA: 78673781592.27.CA74CEC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id AD0739002BFE for ; Fri, 8 Oct 2021 16:19:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633709995; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L06CeZVaZT/gwZhsOIEkX6vSsXsDjU7RBD3vrAJ3vYE=; b=B3y8dQAB/VzWgiryrbFljO6v3TpRtkqQZOaj4iy2b/4F1xE5UJBKGQLmDNDV4+1XY8GPdT CLLqr+lxH0ki1xlf1vAZDQR7HhADjG+/Eavw2Wv2qAhwBtxV1GiXNdXqf3EvPAgOflVYdv jOlOm6uqGAUhsYCht9kDwQ8ZbgBOPTY= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-270-JB0kShi-MjCLWVjvxK2bVQ-1; Fri, 08 Oct 2021 12:19:38 -0400 X-MC-Unique: JB0kShi-MjCLWVjvxK2bVQ-1 Received: by mail-wr1-f71.google.com with SMTP id r25-20020adfab59000000b001609ddd5579so7707825wrc.21 for ; Fri, 08 Oct 2021 09:19:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=L06CeZVaZT/gwZhsOIEkX6vSsXsDjU7RBD3vrAJ3vYE=; b=Rxk05dz7HE1bRXxSM1T0ZvN1BSQ71zGe3fpQDCTnlFAio/rRMY4vVqhMnLU3qRoUca /XPH1ofhJ/sBLSsJqfA5lld+e+tJDmQIGVj7Pj4LnjWpVIXWvCnbRT6yHIgcCOIdKUAW KVLsieJsmJ/qOj7xPKDOz1gRx5u4JANwY6ze88/X1PqN7FQePARq45Cz78LSnrFnGNAl XFzgop8Y8kiddkSroNvzig/pgCaVB870dIIwhL+8by2po0L80TL/aeaXadLpCr12tvoF /c/1oiGiNgNtvu6A9fY4T4jYnp7eps098um8Q8riRSciZ4KxsYngkfqOwrCW0Tw8r6c+ Xnew== X-Gm-Message-State: AOAM531oabpQ37nkyD7Y8S7eV/1Nb7DETtYaSBrJWCxslXFx6ltR1ARP lCv4KVLhv2/hQ/zDDiHJWq8+RhXPt6r9JAV/PwiOgRzFhk/7eMALPieiq1FsvjcpIOOXdI8Qpda NQnpHxURay+s= X-Received: by 2002:adf:a347:: with SMTP id d7mr5432860wrb.139.1633709976763; Fri, 08 Oct 2021 09:19:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzu56ejsI05RU1p5BB9pvX/swKCZbkkrpJVl0uXpJwrS/cA6sAiV/Eyj4pPabDtwM7aHct1Yg== X-Received: by 2002:adf:a347:: with SMTP id d7mr5432815wrb.139.1633709976480; Fri, 08 Oct 2021 09:19:36 -0700 (PDT) Received: from vian.redhat.com ([2a0c:5a80:1d03:b900:c3d1:5974:ce92:3123]) by smtp.gmail.com with ESMTPSA id f184sm2901753wmf.22.2021.10.08.09.19.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 09:19:35 -0700 (PDT) From: Nicolas Saenz Julienne To: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, peterz@infradead.org, mtosatti@redhat.com, nilal@redhat.com, mgorman@suse.de, linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com, paulmck@kernel.org, ppandit@redhat.com, Nicolas Saenz Julienne Subject: [RFC 3/3] mm/page_alloc: Add remote draining support to per-cpu lists Date: Fri, 8 Oct 2021 18:19:22 +0200 Message-Id: <20211008161922.942459-4-nsaenzju@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211008161922.942459-1-nsaenzju@redhat.com> References: <20211008161922.942459-1-nsaenzju@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: AD0739002BFE X-Stat-Signature: huwcwcq967ahignmyijqxbm6njfoa45e Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=B3y8dQAB; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf29.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=nsaenzju@redhat.com X-HE-Tag: 1633709995-588305 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: page_alloc.c's per-cpu page lists are currently protected using local locks. While performance savvy, this doesn't allow for remote access to these structures. CPUs requiring system-wide per-cpu list drains get around this by scheduling drain work on all CPUs. That said, some select setups like systems with NOHZ_FULL CPUs, aren't well suited to this, as they can't handle interruptions of any sort. To mitigate this, replace the current draining mechanism with one that allows remotely draining the lists in a lock-less manner. It leverages the fact that the per-cpu page lists are accessed through indirection, and that the pointer can be updated atomically. Upon draining we now: - Atomically switch the per-cpu lists pointers to ones pointing to empty lists. - Wait for a grace period so as for all concurrent writers holding the old per-cpu lists pointer finish updating them[1]. - Remotely flush the old lists now that we know nobody holds a reference to them. Concurrent access to the drain process is protected by a mutex. RCU guarantees atomicity both while dereferencing the per-cpu lists pointer and replacing it. It also checks for RCU critical section/locking correctness, as all writers have to hold their per-cpu pagesets local lock. Memory ordering on both pointers' data is guaranteed by synchronize_rcu() and the 'pcpu_drain_mutex'. Also, synchronize_rcu_expedited() is used to minimize hangs during low memory situations. Accesses to the pcplists like the ones in mm/vmstat.c don't require RCU supervision since they can handle outdated data, but they do use READ_ONCE() in order to avoid compiler weirdness and be explicit about the concurrent nature of the pcplists pointer. As a side effect to all this we now have to promote the spin_lock() in free_pcppages_bulk() to spin_lock_irqsave() since not all function users enter with interrupts disabled. Signed-off-by: Nicolas Saenz Julienne [1] Note that whatever concurrent writers were doing, the result was going to be flushed anyway as the old mechanism disabled preemption as the means for serialization, so per-cpu drain works were already stepping over whatever was being processed concurrently to the drain call. --- include/linux/mmzone.h | 18 ++++++- mm/page_alloc.c | 114 ++++++++++++++++++++--------------------- mm/vmstat.c | 6 +-- 3 files changed, 75 insertions(+), 63 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fb023da9a181..c112e7831c54 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -365,13 +365,27 @@ struct per_cpu_pages { short expire; /* When 0, remote pagesets are drained */ #endif - struct pcplists *lp; + /* + * Having two pcplists allows us to remotely flush them in a lock-less + * manner: we atomically switch the 'lp' and 'drain' pointers, wait a + * grace period to synchronize against concurrent users of 'lp', and + * safely free whatever is left in 'drain'. + * + * All accesses to 'lp' are protected by local locks, which also serve + * as RCU critical section delimiters. 'lp' should only be dereferenced + * *once* per critical section. + * + * See mm/page_alloc.c's __drain_all_pages() for the bulk of the remote + * drain implementation. + */ + struct pcplists __rcu *lp; + struct pcplists *drain; struct pcplists { /* Number of pages in the lists */ int count; /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[NR_PCP_LISTS]; - } __private pcplists; + } __private pcplists[2]; }; struct per_cpu_zonestat { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 842816f269da..d56d06dde66a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -147,13 +147,7 @@ DEFINE_PER_CPU(int, _numa_mem_); /* Kernel "local memory" node */ EXPORT_PER_CPU_SYMBOL(_numa_mem_); #endif -/* work_structs for global per-cpu drains */ -struct pcpu_drain { - struct zone *zone; - struct work_struct work; -}; static DEFINE_MUTEX(pcpu_drain_mutex); -static DEFINE_PER_CPU(struct pcpu_drain, pcpu_drain); #ifdef CONFIG_GCC_PLUGIN_LATENT_ENTROPY volatile unsigned long latent_entropy __latent_entropy; @@ -1448,6 +1442,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, int prefetch_nr = READ_ONCE(pcp->batch); bool isolated_pageblocks; struct page *page, *tmp; + unsigned long flags; LIST_HEAD(head); /* @@ -1511,11 +1506,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, } lp->count -= nr_freed; - /* - * local_lock_irq held so equivalent to spin_lock_irqsave for - * both PREEMPT_RT and non-PREEMPT_RT configurations. - */ - spin_lock(&zone->lock); + spin_lock_irqsave(&zone->lock, flags); isolated_pageblocks = has_isolate_pageblock(zone); /* @@ -1538,7 +1529,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, __free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE); trace_mm_page_pcpu_drain(page, order, mt); } - spin_unlock(&zone->lock); + spin_unlock_irqrestore(&zone->lock, flags); } static void free_one_page(struct zone *zone, @@ -3076,7 +3067,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) local_lock_irqsave(&pagesets.lock, flags); batch = READ_ONCE(pcp->batch); - lp = pcp->lp; + lp = rcu_dereference_check(pcp->lp, lockdep_is_held(this_cpu_ptr(&pagesets.lock))); to_drain = min(lp->count, batch); if (to_drain > 0) free_pcppages_bulk(zone, to_drain, pcp, lp); @@ -3100,7 +3091,7 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) local_lock_irqsave(&pagesets.lock, flags); pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - lp = pcp->lp; + lp = rcu_dereference_check(pcp->lp, lockdep_is_held(this_cpu_ptr(&pagesets.lock))); if (lp->count) free_pcppages_bulk(zone, lp->count, pcp, lp); @@ -3139,24 +3130,6 @@ void drain_local_pages(struct zone *zone) drain_pages(cpu); } -static void drain_local_pages_wq(struct work_struct *work) -{ - struct pcpu_drain *drain; - - drain = container_of(work, struct pcpu_drain, work); - - /* - * drain_all_pages doesn't use proper cpu hotplug protection so - * we can race with cpu offline when the WQ can move this from - * a cpu pinned worker to an unbound one. We can operate on a different - * cpu which is alright but we also have to make sure to not move to - * a different one. - */ - preempt_disable(); - drain_local_pages(drain->zone); - preempt_enable(); -} - /* * The implementation of drain_all_pages(), exposing an extra parameter to * drain on all cpus. @@ -3169,6 +3142,8 @@ static void drain_local_pages_wq(struct work_struct *work) */ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) { + struct per_cpu_pages *pcp; + struct zone *z; int cpu; /* @@ -3177,13 +3152,6 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) */ static cpumask_t cpus_with_pcps; - /* - * Make sure nobody triggers this path before mm_percpu_wq is fully - * initialized. - */ - if (WARN_ON_ONCE(!mm_percpu_wq)) - return; - /* * Do not drain if one is already in progress unless it's specific to * a zone. Such callers are primarily CMA and memory hotplug and need @@ -3202,8 +3170,6 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) * disables preemption as part of its processing */ for_each_online_cpu(cpu) { - struct per_cpu_pages *pcp; - struct zone *z; bool has_pcps = false; struct pcplists *lp; @@ -3214,12 +3180,12 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) */ has_pcps = true; } else if (zone) { - lp = per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp; + lp = READ_ONCE(per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp); if (lp->count) has_pcps = true; } else { for_each_populated_zone(z) { - lp = per_cpu_ptr(z->per_cpu_pageset, cpu)->lp; + lp = READ_ONCE(per_cpu_ptr(z->per_cpu_pageset, cpu)->lp); if (lp->count) { has_pcps = true; break; @@ -3233,16 +3199,37 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) cpumask_clear_cpu(cpu, &cpus_with_pcps); } + if (!force_all_cpus && cpumask_empty(&cpus_with_pcps)) + goto exit; + + for_each_cpu(cpu, &cpus_with_pcps) { + for_each_populated_zone(z) { + if (zone && zone != z) + continue; + + pcp = per_cpu_ptr(z->per_cpu_pageset, cpu); + pcp->drain = rcu_replace_pointer(pcp->lp, pcp->drain, + mutex_is_locked(&pcpu_drain_mutex)); + } + } + + synchronize_rcu_expedited(); + for_each_cpu(cpu, &cpus_with_pcps) { - struct pcpu_drain *drain = per_cpu_ptr(&pcpu_drain, cpu); + for_each_populated_zone(z) { + int count; + + pcp = per_cpu_ptr(z->per_cpu_pageset, cpu); + count = pcp->drain->count; + if (!count) + continue; - drain->zone = zone; - INIT_WORK(&drain->work, drain_local_pages_wq); - queue_work_on(cpu, mm_percpu_wq, &drain->work); + free_pcppages_bulk(z, count, pcp, pcp->drain); + VM_BUG_ON(pcp->drain->count); + } } - for_each_cpu(cpu, &cpus_with_pcps) - flush_work(&per_cpu_ptr(&pcpu_drain, cpu)->work); +exit: mutex_unlock(&pcpu_drain_mutex); } @@ -3378,7 +3365,7 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn, __count_vm_event(PGFREE); pcp = this_cpu_ptr(zone->per_cpu_pageset); - lp = pcp->lp; + lp = rcu_dereference_check(pcp->lp, lockdep_is_held(this_cpu_ptr(&pagesets.lock))); pindex = order_to_pindex(migratetype, order); list_add(&page->lru, &lp->lists[pindex]); lp->count += 1 << order; @@ -3614,7 +3601,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, struct pcplists *lp; struct page *page; - lp = pcp->lp; + lp = rcu_dereference_check(pcp->lp, lockdep_is_held(this_cpu_ptr(&pagesets.lock))); list = &lp->lists[order_to_pindex(migratetype, order)]; do { @@ -5886,8 +5873,12 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask)) continue; - for_each_online_cpu(cpu) - free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp->count; + for_each_online_cpu(cpu) { + struct pcplists *lp; + + lp = READ_ONCE(per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp); + free_pcp += lp->count; + } } printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n" @@ -5980,8 +5971,12 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) continue; free_pcp = 0; - for_each_online_cpu(cpu) - free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp->count; + for_each_online_cpu(cpu) { + struct pcplists *lp; + + lp = READ_ONCE(per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp); + free_pcp += lp->count; + } show_node(zone); printk(KERN_CONT @@ -6022,7 +6017,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) K(zone_page_state(zone, NR_MLOCK)), K(zone_page_state(zone, NR_BOUNCE)), K(free_pcp), - K(this_cpu_read(zone->per_cpu_pageset)->lp->count), + K(READ_ONCE(this_cpu_ptr(zone->per_cpu_pageset)->lp)->count), K(zone_page_state(zone, NR_FREE_CMA_PAGES))); printk("lowmem_reserve[]:"); for (i = 0; i < MAX_NR_ZONES; i++) @@ -6886,10 +6881,13 @@ static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonesta memset(pcp, 0, sizeof(*pcp)); memset(pzstats, 0, sizeof(*pzstats)); - pcp->lp = &ACCESS_PRIVATE(pcp, pcplists); + pcp->lp = &ACCESS_PRIVATE(pcp, pcplists[0]); + pcp->drain = &ACCESS_PRIVATE(pcp, pcplists[1]); - for (pindex = 0; pindex < NR_PCP_LISTS; pindex++) + for (pindex = 0; pindex < NR_PCP_LISTS; pindex++) { INIT_LIST_HEAD(&pcp->lp->lists[pindex]); + INIT_LIST_HEAD(&pcp->drain->lists[pindex]); + } /* * Set batch and high values safe for a boot pageset. A true percpu diff --git a/mm/vmstat.c b/mm/vmstat.c index 5279d3f34e0b..1ffa4fc64a4f 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -856,7 +856,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) * if not then there is nothing to expire. */ if (!__this_cpu_read(pcp->expire) || - !this_cpu_ptr(pcp)->lp->count) + !READ_ONCE(this_cpu_ptr(pcp)->lp)->count) continue; /* @@ -870,7 +870,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) if (__this_cpu_dec_return(pcp->expire)) continue; - if (this_cpu_ptr(pcp)->lp->count) { + if (READ_ONCE(this_cpu_ptr(pcp)->lp)->count) { drain_zone_pages(zone, this_cpu_ptr(pcp)); changes++; } @@ -1707,7 +1707,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, "\n high: %i" "\n batch: %i", i, - pcp->lp->count, + READ_ONCE(pcp->lp)->count, pcp->high, pcp->batch); #ifdef CONFIG_SMP