From patchwork Fri Sep 14 14:59:22 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
X-Patchwork-Id: 10600909
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD34C14BD
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Fri, 14 Sep 2018 14:59:34 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD0862B8CE
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Fri, 14 Sep 2018 14:59:34 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id AF4712B8D0; Fri, 14 Sep 2018 14:59:34 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 15B682B8CE
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Fri, 14 Sep 2018 14:59:33 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id E0ED38E0002; Fri, 14 Sep 2018 10:59:32 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D988D8E0001; Fri, 14 Sep 2018 10:59:32 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C624C8E0002; Fri, 14 Sep 2018 10:59:32 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com
 [209.85.128.71])
	by kanga.kvack.org (Postfix) with ESMTP id 6B5A78E0001
	for <linux-mm@kvack.org>; Fri, 14 Sep 2018 10:59:32 -0400 (EDT)
Received: by mail-wm1-f71.google.com with SMTP id s205-v6so126213wmf.7
        for <linux-mm@kvack.org>; Fri, 14 Sep 2018 07:59:32 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-original-authentication-results:x-gm-message-state:from:to:cc
         :subject:date:message-id:mime-version:content-transfer-encoding;
        bh=1lx+zcUrarYqQBd9xDyjmXb9tjld5IAFfufbWjDwEus=;
        b=KUlOBOUoHdCkMoepDYS0kzjGcZ8fX0YX5u4jVKX4QbX18aB/MS76JszH+IhY/0fQm3
         ef1kO4etm+9EkGslMvbrHZVW8KeWq4qlYA6v61JFRrPa6JsfLZ02mhspRdt3qqNPPRJT
         mF7ol2QY74Cmh8GlccRA171uUKBn+S0APF+FXuQ5R34/Efd77143b0lrkVUwSjSHNFYv
         WvJlePqSVPag9e6vuO1ntEz8nkKVTTovrP45foqMH+t+a7UbgF2fdxV8aefbGeTb/Iaa
         95P0dULR0n4BOol7DrxXLPQOoBiHeUoABiHVunGpNEUC/j4SkFw9zFTNdi2txoqJy502
         ZnZg==
X-Original-Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 bigeasy@linutronix.de designates 2a01:7a0:2:106d:700::1 as permitted sender)
 smtp.mailfrom=bigeasy@linutronix.de
X-Gm-Message-State: APzg51BBAwMeOT4N0VaB7oAWMFwHsFzNElmW2MzJ6AAfd8AyFlxXTiaE
	fqTr4uyMffd8QttINp+cziFELD4UYJcgLgtWo3H2P9U9KKnAqASerP9UEQKoTw3oB8lUW0edxD6
	b0TEd4pwdiWc6pFUAXBxmdHxCOKlSe2IQIH6A3NReqXTmqfvFbpTZ56Yz9nvM2mX/BQ==
X-Received: by 2002:a1c:620b:: with SMTP id
 w11-v6mr1879244wmb.65.1536937171936;
        Fri, 14 Sep 2018 07:59:31 -0700 (PDT)
X-Google-Smtp-Source: 
 ANB0VdZC2ai8ksvOnF1FnDavakWMwQfefjEfifHe/eT+ErSPVN4G/J1lRwu7ptKeDdexFL1rJEOj
X-Received: by 2002:a1c:620b:: with SMTP id
 w11-v6mr1879184wmb.65.1536937170696;
        Fri, 14 Sep 2018 07:59:30 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1536937170; cv=none;
        d=google.com; s=arc-20160816;
        b=hwJi1of8WNaDv0Uk7cjghY584W6HrcDGJlDzC7GrzkwAXJ68C7mOHuLErMyC7rsXkW
         CAMJbhtvw/LVPkAuDQ/viTqSJbm/KtQpdYSQxeAallQwhYrJMFnqr4RYuObnp/xnfmFg
         Mkt7YRIUzaOxVOQeTYLscaOxNGyfac2LZO9dAbDUiQfJUQZM0WSTK0Z+mdGml02xL7f4
         UPtzCP3+70deZtZDx8zkm/W4dLG+TG51wTBMUvtbv7EMkVBvcparG6eXsvKG23fi2AK5
         DD0l4KQR4gYMYCihhEXB9sN4chl5YbJBmmHFVa/6UlfK0DVKNmC5zMupNl5/4iIGOHhB
         X77Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from;
        bh=1lx+zcUrarYqQBd9xDyjmXb9tjld5IAFfufbWjDwEus=;
        b=IDqBZor+FP42nczxlFu1N61Mc7pKRMTNQ+qujEXtjOCx0sXqEZJ4oP/rMlYrRFCpDu
         yNYyS6ToNhcpQ43cuC1LGmMnjvgGI4augKGZkRoSPy4aRdjNX+fZxI92KVxMUOE6BkBL
         FVpfTZg/AMvdkbU2xMu/EoQy5hA7XmxqoodfjNnX8NpmuA7t1DQsJzIkOclgmslake+I
         y/wBzr4atpzy/zY50hk7ZiK8GYLpoJWf/q62Vf/UVmvdYtMYVLcI6ELmptuGWy0TuEWo
         IKCjfEwgvpV/HNROiZ9PejGV6dKxAETGkidSbPaclvN6cKyHzapt1qQA24NtcRBFyDhO
         q9hw==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: best guess record for domain of
 bigeasy@linutronix.de designates 2a01:7a0:2:106d:700::1 as permitted sender)
 smtp.mailfrom=bigeasy@linutronix.de
Received: from Galois.linutronix.de (Galois.linutronix.de.
 [2a01:7a0:2:106d:700::1])
        by mx.google.com with ESMTPS id
 z65-v6si1816164wme.78.2018.09.14.07.59.30
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=AES128-SHA bits=128/128);
        Fri, 14 Sep 2018 07:59:30 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 bigeasy@linutronix.de designates 2a01:7a0:2:106d:700::1 as permitted sender)
 client-ip=2a01:7a0:2:106d:700::1;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 bigeasy@linutronix.de designates 2a01:7a0:2:106d:700::1 as permitted sender)
 smtp.mailfrom=bigeasy@linutronix.de
Received: from localhost ([127.0.0.1] helo=bazinga.breakpoint.cc)
	by Galois.linutronix.de with esmtp (Exim 4.80)
	(envelope-from <bigeasy@linutronix.de>)
	id 1g0pZA-00019w-Pt; Fri, 14 Sep 2018 16:59:29 +0200
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-mm@kvack.org
Cc: tglx@linutronix.de,
	Vlastimil Babka <vbabka@suse.cz>,
	frederic@kernel.org
Subject: [PATCH 0/2] mm/swap: Add locking for pagevec
Date: Fri, 14 Sep 2018 16:59:22 +0200
Message-Id: <20180914145924.22055-1-bigeasy@linutronix.de>
X-Mailer: git-send-email 2.19.0
MIME-Version: 1.0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000248, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

The swap code synchronizes its access to the (four) pagevec struct
(which is allocated per-CPU) by disabling preemption. This works and the
one struct needs to be accessed from interrupt context is protected by
disabling interrupts. This was manually audited and there is no lockdep
coverage for this.
There is one case where the per-CPU of a remote CPU needs to be accessed
and this is solved by started a worker on the remote CPU and waiting for
it to finish.

I measured the invocation of lru_add_drain_all(), ensured that it would
invoke the drain function but the drain function would not do anything
except the locking (preempt / interrupts on/off) of the individual
pagevec. On a Xeon E5-2650 (2 Socket, 8 cores dual threaded, 32 CPUs in
total) I tried to drain CPU4 and measured how long it took in
microseconds:
               t-771   [001] ....   183.165619: lru_add_drain_all_test: took 92
               t-771   [001] ....   183.165710: lru_add_drain_all_test: took 87
               t-771   [001] ....   183.165781: lru_add_drain_all_test: took 68
               t-771   [001] ....   183.165826: lru_add_drain_all_test: took 43
               t-771   [001] ....   183.165837: lru_add_drain_all_test: took 9
               t-771   [001] ....   183.165847: lru_add_drain_all_test: took 9
               t-771   [001] ....   183.165858: lru_add_drain_all_test: took 9
               t-771   [001] ....   183.165868: lru_add_drain_all_test: took 9
               t-771   [001] ....   183.165878: lru_add_drain_all_test: took 9
               t-771   [001] ....   183.165889: lru_add_drain_all_test: took 9

This is mostly the wake up from idle that takes long and once the CPU is
busy and cache hot it goes down to 9us. If all CPUs are busy in user land then 
               t-1484  [001] .... 40864.452481: lru_add_drain_all_test: took 12
               t-1484  [001] .... 40864.452492: lru_add_drain_all_test: took 8
               t-1484  [001] .... 40864.452500: lru_add_drain_all_test: took 7
               t-1484  [001] .... 40864.452508: lru_add_drain_all_test: took 7
               t-1484  [001] .... 40864.452516: lru_add_drain_all_test: took 7
               t-1484  [001] .... 40864.452524: lru_add_drain_all_test: took 7
               t-1484  [001] .... 40864.452532: lru_add_drain_all_test: took 7
               t-1484  [001] .... 40864.452540: lru_add_drain_all_test: took 7
               t-1484  [001] .... 40864.452547: lru_add_drain_all_test: took 7
               t-1484  [001] .... 40864.452555: lru_add_drain_all_test: took 7

it goes to 7us once the cache is hot.
Invoking the same test on every CPU it gets to:
               t-768   [000] ....    61.508781: lru_add_drain_all_test: took 133
               t-768   [000] ....    61.508892: lru_add_drain_all_test: took 105
               t-768   [000] ....    61.509004: lru_add_drain_all_test: took 108
               t-768   [000] ....    61.509112: lru_add_drain_all_test: took 104
               t-768   [000] ....    61.509220: lru_add_drain_all_test: took 104
               t-768   [000] ....    61.509333: lru_add_drain_all_test: took 109
               t-768   [000] ....    61.509414: lru_add_drain_all_test: took 78
               t-768   [000] ....    61.509493: lru_add_drain_all_test: took 76
               t-768   [000] ....    61.509558: lru_add_drain_all_test: took 63
               t-768   [000] ....    61.509623: lru_add_drain_all_test: took 62

on an idle machine and once the CPUs are busy:
               t-849   [020] ....   379.429727: lru_add_drain_all_test: took 57
               t-849   [020] ....   379.429777: lru_add_drain_all_test: took 47
               t-849   [020] ....   379.429823: lru_add_drain_all_test: took 45
               t-849   [020] ....   379.429870: lru_add_drain_all_test: took 45
               t-849   [020] ....   379.429916: lru_add_drain_all_test: took 45
               t-849   [020] ....   379.429962: lru_add_drain_all_test: took 45
               t-849   [020] ....   379.430009: lru_add_drain_all_test: took 45
               t-849   [020] ....   379.430055: lru_add_drain_all_test: took 45
               t-849   [020] ....   379.430101: lru_add_drain_all_test: took 45
               t-849   [020] ....   379.430147: lru_add_drain_all_test: took 45

so we get down to 45us.

If the preemption based locking gets replaced with a PER-CPU spin_lock()
then it gain a locking scope on the operation. The spin_lock() should not
bring much overhead because it is not contended. However, having the
lock there does not only add lockdep coverage it also allows to access
the data from a remote CPU. So the work can be done on the CPU that
asked for it and there is no need to wake a CPU from idle (or user land).

With this series applied, the test again:
Idle box, all CPUs:
               t-861   [000] ....   861.051780: lru_add_drain_all_test: took 16
               t-861   [000] ....   861.051789: lru_add_drain_all_test: took 7
               t-861   [000] ....   861.051797: lru_add_drain_all_test: took 7
               t-861   [000] ....   861.051805: lru_add_drain_all_test: took 7
               t-861   [000] ....   861.051813: lru_add_drain_all_test: took 7
               t-861   [000] ....   861.051821: lru_add_drain_all_test: took 7
               t-861   [000] ....   861.051829: lru_add_drain_all_test: took 7
               t-861   [000] ....   861.051837: lru_add_drain_all_test: took 7
               t-861   [000] ....   861.051844: lru_add_drain_all_test: took 7
               t-861   [000] ....   861.051852: lru_add_drain_all_test: took 7

which is almost the same compared with "busy, one CPU". Invoking the
test only for a single remote CPU: 
               t-863   [020] ....   906.579885: lru_add_drain_all_test: took 0
               t-863   [020] ....   906.579887: lru_add_drain_all_test: took 0
               t-863   [020] ....   906.579889: lru_add_drain_all_test: took 0
               t-863   [020] ....   906.579889: lru_add_drain_all_test: took 0
               t-863   [020] ....   906.579890: lru_add_drain_all_test: took 0
               t-863   [020] ....   906.579891: lru_add_drain_all_test: took 0
               t-863   [020] ....   906.579892: lru_add_drain_all_test: took 0
               t-863   [020] ....   906.579892: lru_add_drain_all_test: took 0
               t-863   [020] ....   906.579893: lru_add_drain_all_test: took 0
               t-863   [020] ....   906.579894: lru_add_drain_all_test: took 0

and it is less than a microsecond.

Sebastian