eventfd: support delayed wakeup for non-semaphore eventfd to reduce cpu utilization

From: Wen Yang <wenyang.linux@foxmail.com>

From: Wen Yang <wenyang.linux@foxmail.com>

For the NON SEMAPHORE eventfd, if it's counter has a nonzero value,
then a read(2) returns 8 bytes containing that value, and the counter's
value is reset to zero. Therefore, in the NON SEMAPHORE scenario,
N event_writes vs ONE event_read is possible.

However, the current implementation wakes up the read thread immediately
in eventfd_write so that the cpu utilization increases unnecessarily.

By adding a configurable delay after eventfd_write, these unnecessary
wakeup operations are avoided, thereby reducing cpu utilization.

We used the following test code:

 #include <assert.h>
 #include <errno.h>
 #include <unistd.h>
 #include <stdio.h>
 #include <string.h>
 #include <poll.h>
 #include <sys/eventfd.h>
 #include <sys/prctl.h>

void publish(int fd)
{
	unsigned long long i = 0;
	int ret;

	prctl(PR_SET_NAME,"publish");
	while (1) {
		i++;
		ret = write(fd, &i, sizeof(i));
		if (ret < 0)
			printf("XXX: write error: %s\n", strerror(errno));
	}
}

void subscribe(int fd)
{
	unsigned long long i = 0;
	struct pollfd pfds[1];
	int ret;

	prctl(PR_SET_NAME,"subscribe");
	pfds[0].fd = fd;
	pfds[0].events = POLLIN;

	usleep(10);
	while(1) {
		ret = poll(pfds, 1, -1);
		if (ret == -1)
			printf("XXX: poll error: %s\n", strerror(errno));
		if(pfds[0].revents & POLLIN)
			read(fd, &i, sizeof(i));
	}
}

int main(int argc, char *argv[])
{
	pid_t pid;
	int fd;

	fd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK | EFD_NONBLOCK);
	assert(fd);

	pid = fork();
	if (pid == 0)
		subscribe(fd);
	else if (pid > 0)
		publish(fd);
	else {
		printf("XXX: fork error!\n");
		return -1;
	}

	return 0;
}

 # taskset -c 2-3 ./a.out

The original cpu usage is as follows:
07:02:55 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:02:57 PM  all   16.43    0.00   16.28    0.16    0.00    0.00    0.00    0.00    0.00   67.14
07:02:57 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
07:02:57 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
07:02:57 PM    2   29.21    0.00   34.83    1.12    0.00    0.00    0.00    0.00    0.00   34.83
07:02:57 PM    3   51.97    0.00   48.03    0.00    0.00    0.00    0.00    0.00    0.00    0.00

07:02:57 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:02:59 PM  all   18.75    0.00   17.47    2.56    0.00    0.32    0.00    0.00    0.00   60.90
07:02:59 PM    0    6.88    0.00    1.59    5.82    0.00    0.00    0.00    0.00    0.00   85.71
07:02:59 PM    1    1.04    0.00    1.04    2.59    0.00    0.00    0.00    0.00    0.00   95.34
07:02:59 PM    2   26.09    0.00   35.87    0.00    0.00    1.09    0.00    0.00    0.00   36.96
07:02:59 PM    3   52.00    0.00   47.33    0.00    0.00    0.67    0.00    0.00    0.00    0.00

07:02:59 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:03:01 PM  all   16.15    0.00   16.77    0.00    0.00    0.00    0.00    0.00    0.00   67.08
07:03:01 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
07:03:01 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
07:03:01 PM    2   27.47    0.00   36.26    0.00    0.00    0.00    0.00    0.00    0.00   36.26
07:03:01 PM    3   51.30    0.00   48.70    0.00    0.00    0.00    0.00    0.00    0.00    0.00

Then settinga the new control parameter, as follows:
echo 5 > /proc/sys/fs/eventfd_wakeup_delay_msec

The cpu usagen was observed to decrease by more than 20% (cpu #2, 26% -> 0.x%),  as follows:

07:03:01 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:03:03 PM  all   10.31    0.00    8.36    0.00    0.00    0.00    0.00    0.00    0.00   81.34
07:03:03 PM    0    0.00    0.00    1.01    0.00    0.00    0.00    0.00    0.00    0.00   98.99
07:03:03 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
07:03:03 PM    2    0.52    0.00    1.05    0.00    0.00    0.00    0.00    0.00    0.00   98.43
07:03:03 PM    3   56.59    0.00   43.41    0.00    0.00    0.00    0.00    0.00    0.00    0.00

07:03:03 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:03:05 PM  all   10.61    0.00    7.82    0.00    0.00    0.00    0.00    0.00    0.00   81.56
07:03:05 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
07:03:05 PM    1    0.00    0.00    1.01    0.00    0.00    0.00    0.00    0.00    0.00   98.99
07:03:05 PM    2    0.53    0.00    0.53    0.00    0.00    0.00    0.00    0.00    0.00   98.94
07:03:05 PM    3   58.59    0.00   41.41    0.00    0.00    0.00    0.00    0.00    0.00    0.00

07:03:05 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:03:07 PM  all    8.99    0.00    7.25    0.72    0.00    0.00    0.00    0.00    0.00   83.04
07:03:07 PM    0    0.00    0.00    1.52    2.53    0.00    0.00    0.00    0.00    0.00   95.96
07:03:07 PM    1    0.00    0.00    0.50    0.00    0.00    0.00    0.00    0.00    0.00   99.50
07:03:07 PM    2    0.54    0.00    0.54    0.00    0.00    0.00    0.00    0.00    0.00   98.92
07:03:07 PM    3   57.55    0.00   42.45    0.00    0.00    0.00    0.00    0.00    0.00    0.00

Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dylan Yudaken <dylany@fb.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fu Wei <wefu@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/admin-guide/sysctl/fs.rst | 13 +++++
 fs/eventfd.c                            | 78 ++++++++++++++++++++++++-
 init/Kconfig                            | 19 ++++++
 3 files changed, 109 insertions(+), 1 deletion(-)

Message ID	tencent_AF886EF226FD9F39D28FE4D9A94A95FA2605@qq.com (mailing list archive)
State	Mainlined, archived
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B027C77B73 for <linux-fsdevel@archiver.kernel.org>; Sun, 16 Apr 2023 11:45:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229959AbjDPLpn (ORCPT <rfc822;linux-fsdevel@archiver.kernel.org>); Sun, 16 Apr 2023 07:45:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229711AbjDPLpm (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Sun, 16 Apr 2023 07:45:42 -0400 X-Greylist: delayed 801 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sun, 16 Apr 2023 04:45:40 PDT Received: from out203-205-221-190.mail.qq.com (out203-205-221-190.mail.qq.com [203.205.221.190]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48F231BD0 for <linux-fsdevel@vger.kernel.org>; Sun, 16 Apr 2023 04:45:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1681645537; bh=HEn7uCmufleFzqhAfssVM6gFo6aBgBWRD2eGaMBDrYA=; h=From:To:Cc:Subject:Date; b=ZFOknG4OeXyL5Bwp+EqgF3HQd7iZTLsVMXIcGUftzXbudYv6k5PT9VCixlWA+7Cfr l0SsZpeA8X7jFyQ7IOkAHNfk26a9PseIOkYb83hQ8OkdfhNeoMpEcKYczgu0vc7wFW xIoxjmh/TqFM3S2vMwrDUFX6mQ10yDyNlafXhh6w= Received: from wen-VirtualBox.lan ([106.92.97.36]) by newxmesmtplogicsvrsza12-0.qq.com (NewEsmtp) with SMTP id 8010569A; Sun, 16 Apr 2023 19:32:01 +0800 X-QQ-mid: xmsmtpt1681644721tkwysbehx Message-ID: <tencent_AF886EF226FD9F39D28FE4D9A94A95FA2605@qq.com> X-QQ-XMAILINFO: NQR8mRxMnur9eZ/qO3bGnXWU67tnbfJ7f+MZu9EtuAFRQzCN6W51OV9NvuyvLQ fre6s75Vv9BNPg1NyGHzB8iMlYlwS1YHsq6Ao67eJAwBMGLQrDqbczTytpdwKM1lRyop9kVQd/6U RiBs1i1c1MVrP0cJU2f/wL7taVgbvYeDgAJEDkMveidUTGgpTFdnkORVW1p05BUrGWxryONwWx0p g7dKJWq56kCu1gAa5Vjm/CJ6ecMwNXAc5gQZ/BskeVWZnHPSFkURkGF0T/m+oqjUfEmWJgfPwDRd m9MuLWP+4UfLQG/t1BagvPBEk4pZv56xfRuoBNaOD/BLcE0uqD3Z2pm8JUjfhdgnsP7ZJnEOueEl DB8p/L/o3At7RPhkfjgVEsPOZlQRyOzR912tOu7XBujPcR0FlPcSY5ryNSCECADgvP0O9RGtoJt+ uIq39QjmR/KBIQy1v25EULnNwvcjYgXCgPDFTziBYaKMnNxgEBwNldeOpUz/YznHM3oovOHCowXz Xs5dRm1N2FaUzROlbN2DNEmr0v93+2AqdaCXhlg+ZObQwac4j7PVSkKHL/wWB7/eolproQPT//GK eEODWXhUz4TX9ustd3jotBQSh382uQnCd4nQ6LDfsf3aNOZACrJ2K6kwOsWvYaagtzDkc0lCPsd4 ri5z8WI09/apg180fQivy2ZFyS/5XOq5l8CtaHJ9MoNqXUYZ3Bx/WeR7bdigYDKYC70QP20LvpYl ncpINZLu9ie3g8TyNBXWdJExZax+ed3sZcWQsesBFywqNf+905bhD3STASf/iupfjpX8EJEFF7dx vzfQ8S6/yOxVbMDyEmX23VM91mRWcCEd1fnIJZsWnA3v40faGSJSW3BRv7v4gFUf6huWjeGhVVbb UBOmKTXZlUmgBIVqGIHlGrC8i1Cvb550ng+XqL516GudWFu92USu3fyi/AbSA41LT9uaF2PbQsg2 MNdFndHLHWq8Fxt8/49EIiF9K/TZTeUvJWpJFP3UfFh88M3hJeitytiQYXAKl7RijpBfsOdo6KLL m2bdxTJ5EJTWeYH722CzC122dawV4= From: wenyang.linux@foxmail.com To: Alexander Viro <viro@zeniv.linux.org.uk>, Jens Axboe <axboe@kernel.dk>, Christian Brauner <brauner@kernel.org> Cc: Wen Yang <wenyang.linux@foxmail.com>, Christoph Hellwig <hch@lst.de>, Dylan Yudaken <dylany@fb.com>, David Woodhouse <dwmw@amazon.co.uk>, Paolo Bonzini <pbonzini@redhat.com>, Fu Wei <wefu@redhat.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] eventfd: support delayed wakeup for non-semaphore eventfd to reduce cpu utilization Date: Sun, 16 Apr 2023 19:31:55 +0800 X-OQ-MSGID: <20230416113155.18753-1-wenyang.linux@foxmail.com> X-Mailer: git-send-email 2.37.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org
Series	eventfd: support delayed wakeup for non-semaphore eventfd to reduce cpu utilization \| expand eventfd: support delayed wakeup for non-semaphore eventfd to reduce cpu utilization

eventfd: support delayed wakeup for non-semaphore eventfd to reduce cpu utilization

Commit Message

Comments

Patch