From patchwork Fri Sep 2 15:22:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiebin Sun X-Patchwork-Id: 12963700 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E254DC54EE9 for ; Fri, 2 Sep 2022 07:04:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81350800CA; Fri, 2 Sep 2022 03:04:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C30C8008D; Fri, 2 Sep 2022 03:04:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68B45800CA; Fri, 2 Sep 2022 03:04:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 568298008D for ; Fri, 2 Sep 2022 03:04:26 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2C5EBA0FF4 for ; Fri, 2 Sep 2022 07:04:26 +0000 (UTC) X-FDA: 79866256932.15.98BA62E Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf12.hostedemail.com (Postfix) with ESMTP id 3E27640059 for ; Fri, 2 Sep 2022 07:04:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662102265; x=1693638265; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=Cox5lSStTXxvPDdJM1Bm7NWqe+SAjCN+fp0k9U9fd3E=; b=gSxXRjlXNHp6EOiP/Ul6cOpHzdZNruhbXJTlRJdoh0AnrtKOSZnXxPKy ICULFLvRQ561EqRPkRSnl+GwOH5o5DCge+g3zg1thiuwU5d0dhcNO39BC /dk/nmNbtnKpF2Hk9zaXgKldVdnOKEumbOZNGjmr8v2cZe3lYHdnSlI5O 2pd4mY2QRegFgu3tMrGEPS5HOqAFQ0BgQAi1yNBpsN2bwQnakmmtZl0VV W8Yar5JSQyY6rWeDqB4yMcKp+yTIvxfbAritNgSoW1uO0gPYwj75+rhLo VEe3cf0HWPcJGDcTRkuRJqpkEKKwXtIYn40vE4lD4FlErH1v0iHb0CYby w==; X-IronPort-AV: E=McAfee;i="6500,9779,10457"; a="359870100" X-IronPort-AV: E=Sophos;i="5.93,283,1654585200"; d="scan'208";a="359870100" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Sep 2022 00:04:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,283,1654585200"; d="scan'208";a="674234906" Received: from linux-pnp-server-13.sh.intel.com ([10.239.176.176]) by fmsmga008.fm.intel.com with ESMTP; 02 Sep 2022 00:04:19 -0700 From: Jiebin Sun To: akpm@linux-foundation.org, vasily.averin@linux.dev, shakeelb@google.com, dennis@kernel.org, tj@kernel.org, cl@linux.com, ebiederm@xmission.com, legion@kernel.org, manfred@colorfullife.com, alexander.mikhalitsyn@virtuozzo.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tim.c.chen@intel.com, feng.tang@intel.com, ying.huang@intel.com, tianyou.li@intel.com, wangyang.guo@intel.com, jiebin.sun@intel.com Subject: [PATCH] ipc/msg.c: mitigate the lock contention with percpu counter Date: Fri, 2 Sep 2022 23:22:43 +0800 Message-Id: <20220902152243.479592-1-jiebin.sun@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662102265; a=rsa-sha256; cv=none; b=eba1eT32gjxKQS/gXaXznV9UDid8IVYpUAr3pTnx74Z0Dzks4ArontYrD1CpA2gv1tK+Nv CpgRq21ZrpkZfNBLC9gm/znO8XIOowLf+S60fQR8KBvoBzsKGt8qXJoRM1gyn26QEHsrSb hhijDRFXQXQdP0JHEWIwZgA3LESuJjo= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=gSxXRjlX; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none); spf=softfail (imf12.hostedemail.com: 134.134.136.100 is neither permitted nor denied by domain of jiebin.sun@intel.com) smtp.mailfrom=jiebin.sun@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662102265; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=2vbYine9iUjSrygk1WYKBgRRGZY8koLG7psXBOYY6Ag=; b=ACTE+65tiVLFD/6DhOwdFl83Vq46z74DEh3XSukTY90KX0KvV0FOWEYBEwEdqdOzOsIFWt LybM1MiYrJPGRyuT6uosBYHT3dfvAEry4KXD6DEHxRleQMJ5/ufxNt1fnxIIXBW/LarDip RAOX/h23QD1fDiUWTOcLs/6QmJxi45s= X-Stat-Signature: kg5jh35abemrzcbed3j33o8656hqdczx X-Rspamd-Queue-Id: 3E27640059 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=gSxXRjlX; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none); spf=softfail (imf12.hostedemail.com: 134.134.136.100 is neither permitted nor denied by domain of jiebin.sun@intel.com) smtp.mailfrom=jiebin.sun@intel.com X-Rspamd-Server: rspam08 X-HE-Tag: 1662102265-728511 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counters greatly improve the performance. Since there is one unique ipc namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent. Apply the patch and test the pts/stress-ng-1.4.0 -- system v message passing (160 threads). Score gain: 3.38x CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/stress-ng-1.4.0 -- system v message passing (160 threads) Signed-off-by: Jiebin Sun --- include/linux/ipc_namespace.h | 5 +++-- include/linux/percpu_counter.h | 9 +++++++++ ipc/msg.c | 30 +++++++++++++++++------------- lib/percpu_counter.c | 6 ++++++ 4 files changed, 35 insertions(+), 15 deletions(-) diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index e3e8c8662b49..e8240cf2611a 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -11,6 +11,7 @@ #include #include #include +#include struct user_namespace; @@ -36,8 +37,8 @@ struct ipc_namespace { unsigned int msg_ctlmax; unsigned int msg_ctlmnb; unsigned int msg_ctlmni; - atomic_t msg_bytes; - atomic_t msg_hdrs; + struct percpu_counter percpu_msg_bytes; + struct percpu_counter percpu_msg_hdrs; size_t shm_ctlmax; size_t shm_ctlall; diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index 01861eebed79..6eec30122cc3 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -40,6 +40,7 @@ int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gfp, void percpu_counter_destroy(struct percpu_counter *fbc); void percpu_counter_set(struct percpu_counter *fbc, s64 amount); +void percpu_counter_add_local(struct percpu_counter *fbc, s64 amount); void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch); s64 __percpu_counter_sum(struct percpu_counter *fbc); @@ -138,6 +139,14 @@ percpu_counter_add(struct percpu_counter *fbc, s64 amount) preempt_enable(); } +static inline void +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) +{ + preempt_disable(); + fbc->count += amount; + preempt_enable(); +} + static inline void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) { diff --git a/ipc/msg.c b/ipc/msg.c index a0d05775af2c..1b498537f05e 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -285,10 +286,10 @@ static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp) rcu_read_unlock(); list_for_each_entry_safe(msg, t, &msq->q_messages, m_list) { - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_hdrs, -1); free_msg(msg); } - atomic_sub(msq->q_cbytes, &ns->msg_bytes); + percpu_counter_add_local(&ns->percpu_msg_bytes, -(msq->q_cbytes)); ipc_update_pid(&msq->q_lspid, NULL); ipc_update_pid(&msq->q_lrpid, NULL); ipc_rcu_putref(&msq->q_perm, msg_rcu_free); @@ -495,17 +496,18 @@ static int msgctl_info(struct ipc_namespace *ns, int msqid, msginfo->msgssz = MSGSSZ; msginfo->msgseg = MSGSEG; down_read(&msg_ids(ns).rwsem); - if (cmd == MSG_INFO) { + if (cmd == MSG_INFO) msginfo->msgpool = msg_ids(ns).in_use; - msginfo->msgmap = atomic_read(&ns->msg_hdrs); - msginfo->msgtql = atomic_read(&ns->msg_bytes); + max_idx = ipc_get_maxidx(&msg_ids(ns)); + up_read(&msg_ids(ns).rwsem); + if (cmd == MSG_INFO) { + msginfo->msgmap = percpu_counter_sum(&ns->percpu_msg_hdrs); + msginfo->msgtql = percpu_counter_sum(&ns->percpu_msg_bytes); } else { msginfo->msgmap = MSGMAP; msginfo->msgpool = MSGPOOL; msginfo->msgtql = MSGTQL; } - max_idx = ipc_get_maxidx(&msg_ids(ns)); - up_read(&msg_ids(ns).rwsem); return (max_idx < 0) ? 0 : max_idx; } @@ -935,8 +937,8 @@ static long do_msgsnd(int msqid, long mtype, void __user *mtext, list_add_tail(&msg->m_list, &msq->q_messages); msq->q_cbytes += msgsz; msq->q_qnum++; - atomic_add(msgsz, &ns->msg_bytes); - atomic_inc(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_bytes, msgsz); + percpu_counter_add_local(&ns->percpu_msg_hdrs, 1); } err = 0; @@ -1159,8 +1161,8 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in msq->q_rtime = ktime_get_real_seconds(); ipc_update_pid(&msq->q_lrpid, task_tgid(current)); msq->q_cbytes -= msg->m_ts; - atomic_sub(msg->m_ts, &ns->msg_bytes); - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_bytes, -(msg->m_ts)); + percpu_counter_add_local(&ns->percpu_msg_hdrs, -1); ss_wakeup(msq, &wake_q, false); goto out_unlock0; @@ -1303,14 +1305,16 @@ void msg_init_ns(struct ipc_namespace *ns) ns->msg_ctlmnb = MSGMNB; ns->msg_ctlmni = MSGMNI; - atomic_set(&ns->msg_bytes, 0); - atomic_set(&ns->msg_hdrs, 0); + percpu_counter_init(&ns->percpu_msg_bytes, 0, GFP_KERNEL); + percpu_counter_init(&ns->percpu_msg_hdrs, 0, GFP_KERNEL); ipc_init_ids(&ns->ids[IPC_MSG_IDS]); } #ifdef CONFIG_IPC_NS void msg_exit_ns(struct ipc_namespace *ns) { + percpu_counter_destroy(&ns->percpu_msg_bytes); + percpu_counter_destroy(&ns->percpu_msg_hdrs); free_ipcs(ns, &msg_ids(ns), freeque); idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr); rhashtable_destroy(&ns->ids[IPC_MSG_IDS].key_ht); diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index ed610b75dc32..d33cb750962a 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -72,6 +72,12 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 amount) } EXPORT_SYMBOL(percpu_counter_set); +void percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) +{ + this_cpu_add(*fbc->counters, amount); +} +EXPORT_SYMBOL(percpu_counter_add_local); + /* * This function is both preempt and irq safe. The former is due to explicit * preemption disable. The latter is guaranteed by the fact that the slow path From patchwork Mon Sep 5 19:35:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiebin Sun X-Patchwork-Id: 12966000 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D6F5ECAAD5 for ; Mon, 5 Sep 2022 11:16:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD02E801D7; Mon, 5 Sep 2022 07:16:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7EE88D0050; Mon, 5 Sep 2022 07:16:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 946F6801D7; Mon, 5 Sep 2022 07:16:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 85D6B8D0050 for ; Mon, 5 Sep 2022 07:16:24 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5BFBA804E9 for ; Mon, 5 Sep 2022 11:16:24 +0000 (UTC) X-FDA: 79877778288.07.EC12173 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf24.hostedemail.com (Postfix) with ESMTP id DC08018008C for ; Mon, 5 Sep 2022 11:16:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662376582; x=1693912582; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=I0osSsi98bC4WEjJPI5+11kFjHGkl9xg1bMtejrSa1A=; b=GpEQ9IT9tpG/ZPSXt2VT1xntrFBrzXDCfgEoykUO/luB2asZY6t6qZso yehA+myN9nqKxFxnraZmFd/oLmCtcVgGS3sVaxoRhJZ1QIUIYFVb9fGJx CF4T2+3fsJ/3YHWNqsM1kyW8aAMMODehLGlPX4a/E9JZBnjyYq+gyhDzX 85vSsRKhSnrSJIpFWsWJocdJOEzg3lIv/EJQqmuc7RTi+4m3jyzSuSjFG WntdtdVc/O3dOHTm4+By4JoSDGVTeSO0gQS3x1ZUXaoZbQa+2Qb/shm/v xFwJKm9mwZsDOhM1bqCBVQ6kzVZDQm6HLjaM+n/R7i42dm92WIqvo+EkG g==; X-IronPort-AV: E=McAfee;i="6500,9779,10460"; a="360321093" X-IronPort-AV: E=Sophos;i="5.93,291,1654585200"; d="scan'208";a="360321093" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Sep 2022 04:16:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,291,1654585200"; d="scan'208";a="643774499" Received: from linux-pnp-server-13.sh.intel.com ([10.239.176.176]) by orsmga008.jf.intel.com with ESMTP; 05 Sep 2022 04:16:17 -0700 From: Jiebin Sun To: akpm@linux-foundation.org, vasily.averin@linux.dev, shakeelb@google.com, dennis@kernel.org, tj@kernel.org, cl@linux.com, ebiederm@xmission.com, legion@kernel.org, manfred@colorfullife.com, alexander.mikhalitsyn@virtuozzo.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tim.c.chen@intel.com, feng.tang@intel.com, ying.huang@intel.com, tianyou.li@intel.com, wangyang.guo@intel.com, jiebin.sun@intel.com Subject: [PATCH v2 2/2] ipc/msg: mitigate the lock contention with percpu counter Date: Tue, 6 Sep 2022 03:35:15 +0800 Message-Id: <20220905193516.846647-2-jiebin.sun@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220905193516.846647-1-jiebin.sun@intel.com> References: <20220902152243.479592-1-jiebin.sun@intel.com> <20220905193516.846647-1-jiebin.sun@intel.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=GpEQ9IT9; spf=softfail (imf24.hostedemail.com: 134.134.136.100 is neither permitted nor denied by domain of jiebin.sun@intel.com) smtp.mailfrom=jiebin.sun@intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662376583; a=rsa-sha256; cv=none; b=jHL3V32b0hJqhi9uQH0ltgDiiDW1+kF+tfGPo49r/r9gThmK2rthHCp3zkLTSYyjD5d/EE 5BXlnG7J92JM4SsCCB8IDI2AsE9wT1wQvq9u8Oh1FQ5dhYJUQAAKC5p36uoqdBMs7ioIAS 3KFh8okuALU1izNYzGggpUNgCRu3jr8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662376583; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gqOpUgOp+m/JBLdSmwI0V7WEU5HODWc+qLjleENDRHc=; b=dfGb+2OCIv40Vk6dbOgGYwJs6fnmJUxhoJ0FQcfWnQby4KiFTiPHD+VRMxWHFnEKxyiSuu CJCywr+7zMHh2jfm99MJFahvBufCsLc5kL+Wv+cSKOfi1IwJP/3kNeezQZCsrubixeaj7+ 3YCYAvG0+AkDyYiIqStOoAbgjha12AI= X-Stat-Signature: x7p8177kat5hngznsx6zobtpa5z78tbo X-Rspamd-Queue-Id: DC08018008C X-Rspam-User: Authentication-Results: imf24.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=GpEQ9IT9; spf=softfail (imf24.hostedemail.com: 134.134.136.100 is neither permitted nor denied by domain of jiebin.sun@intel.com) smtp.mailfrom=jiebin.sun@intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) X-Rspamd-Server: rspam07 X-HE-Tag: 1662376582-308110 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counters greatly improve the performance. Since there is one unique ipc namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent. Apply the patch and test the pts/stress-ng-1.4.0 -- system v message passing (160 threads). Score gain: 3.38x CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/stress-ng-1.4.0 -- system v message passing (160 threads) Signed-off-by: Jiebin Sun --- include/linux/ipc_namespace.h | 5 ++-- ipc/msg.c | 44 ++++++++++++++++++++++++----------- ipc/namespace.c | 5 +++- ipc/util.h | 4 ++-- 4 files changed, 39 insertions(+), 19 deletions(-) diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index e3e8c8662b49..e8240cf2611a 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -11,6 +11,7 @@ #include #include #include +#include struct user_namespace; @@ -36,8 +37,8 @@ struct ipc_namespace { unsigned int msg_ctlmax; unsigned int msg_ctlmnb; unsigned int msg_ctlmni; - atomic_t msg_bytes; - atomic_t msg_hdrs; + struct percpu_counter percpu_msg_bytes; + struct percpu_counter percpu_msg_hdrs; size_t shm_ctlmax; size_t shm_ctlall; diff --git a/ipc/msg.c b/ipc/msg.c index a0d05775af2c..87c30decb23f 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -285,10 +286,10 @@ static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp) rcu_read_unlock(); list_for_each_entry_safe(msg, t, &msq->q_messages, m_list) { - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_hdrs, -1); free_msg(msg); } - atomic_sub(msq->q_cbytes, &ns->msg_bytes); + percpu_counter_add_local(&ns->percpu_msg_bytes, -(msq->q_cbytes)); ipc_update_pid(&msq->q_lspid, NULL); ipc_update_pid(&msq->q_lrpid, NULL); ipc_rcu_putref(&msq->q_perm, msg_rcu_free); @@ -495,17 +496,18 @@ static int msgctl_info(struct ipc_namespace *ns, int msqid, msginfo->msgssz = MSGSSZ; msginfo->msgseg = MSGSEG; down_read(&msg_ids(ns).rwsem); - if (cmd == MSG_INFO) { + if (cmd == MSG_INFO) msginfo->msgpool = msg_ids(ns).in_use; - msginfo->msgmap = atomic_read(&ns->msg_hdrs); - msginfo->msgtql = atomic_read(&ns->msg_bytes); + max_idx = ipc_get_maxidx(&msg_ids(ns)); + up_read(&msg_ids(ns).rwsem); + if (cmd == MSG_INFO) { + msginfo->msgmap = percpu_counter_sum(&ns->percpu_msg_hdrs); + msginfo->msgtql = percpu_counter_sum(&ns->percpu_msg_bytes); } else { msginfo->msgmap = MSGMAP; msginfo->msgpool = MSGPOOL; msginfo->msgtql = MSGTQL; } - max_idx = ipc_get_maxidx(&msg_ids(ns)); - up_read(&msg_ids(ns).rwsem); return (max_idx < 0) ? 0 : max_idx; } @@ -935,8 +937,8 @@ static long do_msgsnd(int msqid, long mtype, void __user *mtext, list_add_tail(&msg->m_list, &msq->q_messages); msq->q_cbytes += msgsz; msq->q_qnum++; - atomic_add(msgsz, &ns->msg_bytes); - atomic_inc(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_bytes, msgsz); + percpu_counter_add_local(&ns->percpu_msg_hdrs, 1); } err = 0; @@ -1159,8 +1161,8 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in msq->q_rtime = ktime_get_real_seconds(); ipc_update_pid(&msq->q_lrpid, task_tgid(current)); msq->q_cbytes -= msg->m_ts; - atomic_sub(msg->m_ts, &ns->msg_bytes); - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_bytes, -(msg->m_ts)); + percpu_counter_add_local(&ns->percpu_msg_hdrs, -1); ss_wakeup(msq, &wake_q, false); goto out_unlock0; @@ -1297,20 +1299,34 @@ COMPAT_SYSCALL_DEFINE5(msgrcv, int, msqid, compat_uptr_t, msgp, } #endif -void msg_init_ns(struct ipc_namespace *ns) +int msg_init_ns(struct ipc_namespace *ns) { + int ret; + ns->msg_ctlmax = MSGMAX; ns->msg_ctlmnb = MSGMNB; ns->msg_ctlmni = MSGMNI; - atomic_set(&ns->msg_bytes, 0); - atomic_set(&ns->msg_hdrs, 0); + ret = percpu_counter_init(&ns->percpu_msg_bytes, 0, GFP_KERNEL); + if (ret) + goto fail_msg_bytes; + ret = percpu_counter_init(&ns->percpu_msg_hdrs, 0, GFP_KERNEL); + if (ret) + goto fail_msg_hdrs; ipc_init_ids(&ns->ids[IPC_MSG_IDS]); + return 0; + + fail_msg_hdrs: + percpu_counter_destroy(&ns->percpu_msg_bytes); + fail_msg_bytes: + return ret; } #ifdef CONFIG_IPC_NS void msg_exit_ns(struct ipc_namespace *ns) { + percpu_counter_destroy(&ns->percpu_msg_bytes); + percpu_counter_destroy(&ns->percpu_msg_hdrs); free_ipcs(ns, &msg_ids(ns), freeque); idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr); rhashtable_destroy(&ns->ids[IPC_MSG_IDS].key_ht); diff --git a/ipc/namespace.c b/ipc/namespace.c index e1fcaedba4fa..8316ea585733 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -66,8 +66,11 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, if (!setup_ipc_sysctls(ns)) goto fail_mq; + err = msg_init_ns(ns); + if (err) + goto fail_put; + sem_init_ns(ns); - msg_init_ns(ns); shm_init_ns(ns); return ns; diff --git a/ipc/util.h b/ipc/util.h index 2dd7ce0416d8..1b0086c6346f 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -64,7 +64,7 @@ static inline void mq_put_mnt(struct ipc_namespace *ns) { } #ifdef CONFIG_SYSVIPC void sem_init_ns(struct ipc_namespace *ns); -void msg_init_ns(struct ipc_namespace *ns); +int msg_init_ns(struct ipc_namespace *ns); void shm_init_ns(struct ipc_namespace *ns); void sem_exit_ns(struct ipc_namespace *ns); @@ -72,7 +72,7 @@ void msg_exit_ns(struct ipc_namespace *ns); void shm_exit_ns(struct ipc_namespace *ns); #else static inline void sem_init_ns(struct ipc_namespace *ns) { } -static inline void msg_init_ns(struct ipc_namespace *ns) { } +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} static inline void shm_init_ns(struct ipc_namespace *ns) { } static inline void sem_exit_ns(struct ipc_namespace *ns) { }