From patchwork Sat Jul 13 02:39:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ma, Yu" X-Patchwork-Id: 13732282 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFA0117591; Sat, 13 Jul 2024 02:13:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836803; cv=none; b=HlMFGVagxDNIw1K2R0GLeMcONdlkl6i3kuuqWu3ZnD46Ruz6kvInXiVt2ABwzj3TZK9fabwj8vaAU3+EiJNcfm6cmHGRG7a/ZYmbcrTlpNySMac42r2vkg6WZazTPFGTHyPTUN/0EucahfgvAm4Eu+lOKHNzrqeMjwJDq43uFEg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836803; c=relaxed/simple; bh=vbUi5zKMqImT3C+nksSqGjIBCDKIitAH3vpPg8VR/+w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Tzaxc6B/haPXc5u8Sc8cqm3cfU1zwHv88bj1MbksRuXGkXdvYEFK/vH8so7QJZy9OFS3d9DqxfEfMU21ZpkDtBRY8KpyNKsIxQ4hDaPp+46PBlaeGEv3Lx5FtCasiSCaAAV1W5qKSbbSu9aPRVKwTpttdkJNRWLX2skTmeYYj64= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QAI3Fh5R; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QAI3Fh5R" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720836801; x=1752372801; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vbUi5zKMqImT3C+nksSqGjIBCDKIitAH3vpPg8VR/+w=; b=QAI3Fh5RKOMuFodzxY3Aug6gPo7SZJO3+PMxmqCmc05W//RkQB8qdwgE F9+Eg0USu/qOwwknBQvvBwf+XvuFQmJXqiolC/pxIDdIU0EPhxxrT1aD5 mk9Ar2QZruaHgQxIDgXpFB7MbLjJoPRPqQQkb6pUQiFtxcLmJz0Z1Csat QovvD/NlTkVZsFlgVGD8jqIt43UvoVkUPIYoFQM90BW5GZsBs/ZugnMuK HFK5974C4SWdAMBvfEPnPSE8xfJc5h2Fjf2iMpiqNy3kZLTq2eXyRjI23 +e1MDqkSlamM9tYljjvIW8DXxnvPXvaC8nC8J8ipWW1fXpkHuPegppdbd Q==; X-CSE-ConnectionGUID: R1B9IaOfTg6H90wwVMRtPw== X-CSE-MsgGUID: +qwlZq1ISLuutiEfHo5lQw== X-IronPort-AV: E=McAfee;i="6700,10204,11131"; a="12531269" X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="12531269" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 19:13:21 -0700 X-CSE-ConnectionGUID: vADG1FrCSluLzH3d1BXzFA== X-CSE-MsgGUID: WOQ7wzXJTy6laTIRgMJ1Cg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="53449891" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa005.fm.intel.com with ESMTP; 12 Jul 2024 19:13:19 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v4 1/3] fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd() Date: Fri, 12 Jul 2024 22:39:15 -0400 Message-ID: <20240713023917.3967269-2-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240713023917.3967269-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240713023917.3967269-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 alloc_fd() has a sanity check inside to make sure the struct file mapping to the allocated fd is NULL. Remove this sanity check since it can be assured by exisitng zero initilization and NULL set when recycling fd. Meanwhile, add likely/unlikely and expand_file() call avoidance to reduce the work under file_lock. Reviewed-by: Tim Chen Signed-off-by: Yu Ma Reviewed-by: Jan Kara --- fs/file.c | 33 ++++++++++++++------------------- 1 file changed, 14 insertions(+), 19 deletions(-) diff --git a/fs/file.c b/fs/file.c index a3b72aa64f11..e1b9d6df7941 100644 --- a/fs/file.c +++ b/fs/file.c @@ -515,7 +515,7 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags) if (fd < files->next_fd) fd = files->next_fd; - if (fd < fdt->max_fds) + if (likely(fd < fdt->max_fds)) fd = find_next_fd(fdt, fd); /* @@ -523,19 +523,21 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags) * will limit the total number of files that can be opened. */ error = -EMFILE; - if (fd >= end) + if (unlikely(fd >= end)) goto out; - error = expand_files(files, fd); - if (error < 0) - goto out; + if (unlikely(fd >= fdt->max_fds)) { + error = expand_files(files, fd); + if (error < 0) + goto out; - /* - * If we needed to expand the fs array we - * might have blocked - try again. - */ - if (error) - goto repeat; + /* + * If we needed to expand the fs array we + * might have blocked - try again. + */ + if (error) + goto repeat; + } if (start <= files->next_fd) files->next_fd = fd + 1; @@ -546,13 +548,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags) else __clear_close_on_exec(fd, fdt); error = fd; -#if 1 - /* Sanity check */ - if (rcu_access_pointer(fdt->fd[fd]) != NULL) { - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd); - rcu_assign_pointer(fdt->fd[fd], NULL); - } -#endif out: spin_unlock(&files->file_lock); @@ -618,7 +613,7 @@ void fd_install(unsigned int fd, struct file *file) rcu_read_unlock_sched(); spin_lock(&files->file_lock); fdt = files_fdtable(files); - BUG_ON(fdt->fd[fd] != NULL); + WARN_ON(fdt->fd[fd] != NULL); rcu_assign_pointer(fdt->fd[fd], file); spin_unlock(&files->file_lock); return; From patchwork Sat Jul 13 02:39:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ma, Yu" X-Patchwork-Id: 13732283 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70E251B59A; Sat, 13 Jul 2024 02:13:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836807; cv=none; b=hxEkbaQgu6UfyYA3HqN10/nvwrqpluZDbRFFUMgqv69A503XTX2WOs9EMwTY/NfCEbcNCzZjQBn40v45S7B+gPQf/qqkSf/qLvWzwo7Nn1zfCz00WNLDnQFCljAyhI1kV1htgweU51Z6eKfZ8SDf0r6lgBsWRPekuy9wl7cX9x8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836807; c=relaxed/simple; bh=Kh4pfuxGLBxgdiaqyNTwfQleD37uUT2ITDpzkzR/xrE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Oba5kbVkegvVzbTJaE4YpWKqNOtTqx2AM348mqcj9ZBvkhLZRL7vj/6JegwPkOd7r5H/rUcHw9bBLY+e5/+qzIfED1RySp4qeGKrZixP+svZvyufwWdDPUMU0S3SyXmHm3KGKti0UKHY+TynHaJwJOetuSCYrnB3ziTuTUdnwAM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hC4OsMzE; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hC4OsMzE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720836805; x=1752372805; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Kh4pfuxGLBxgdiaqyNTwfQleD37uUT2ITDpzkzR/xrE=; b=hC4OsMzEfVG3arH4E70Q1XJmaZkUo1emcNz1QCFkVNTA7H3CejMakhXY RBADPdstOlcQEV3+Oe4Bw5U77u0n2tJE0ffc+NHebwFLVxRoPubIvQzfi 1chapfMNes2O80csGIPlWFqZ6Yw+LNwbB2z8woqBv2nFQIFn2aBfVYK1F YiaPew57QlOLm5OIVK3LcY6GY51U5g2k8eDvNvTHnqvId5gnKOiR6AML6 Lhkxg++e4QMjAnBAoKdN6/64sDkVRslmKHlfKv/BekACLkUDZQjvOKCG6 mOSAuaRI8QHgJ9MNsb7E4LJAfK0jMgQJfev41p5eGBJzeRsWGYlDcd0yN A==; X-CSE-ConnectionGUID: GRPMHdH2RjC8rkFNTerYgw== X-CSE-MsgGUID: D/YVpXdmTJqC0NpbaHVP8Q== X-IronPort-AV: E=McAfee;i="6700,10204,11131"; a="12531275" X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="12531275" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 19:13:25 -0700 X-CSE-ConnectionGUID: qaXw09J2RMml9GYf4aC60w== X-CSE-MsgGUID: rxMb0jNBT6Gvdt0jcUujNQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="53449901" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa005.fm.intel.com with ESMTP; 12 Jul 2024 19:13:23 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v4 2/3] fs/file.c: conditionally clear full_fds Date: Fri, 12 Jul 2024 22:39:16 -0400 Message-ID: <20240713023917.3967269-3-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240713023917.3967269-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240713023917.3967269-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 64 bits in open_fds are mapped to a common bit in full_fds_bits. It is very likely that a bit in full_fds_bits has been cleared before in __clear_open_fds()'s operation. Check the clear bit in full_fds_bits before clearing to avoid unnecessary write and cache bouncing. See commit fc90888d07b8 ("vfs: conditionally clear close-on-exec flag") for a similar optimization. take stock kernel with patch 1 as baseline, it improves pts/blogbench-1.1.0 read for 13%, and write for 5% on Intel ICX 160 cores configuration with v6.10-rc7. Reviewed-by: Jan Kara Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index e1b9d6df7941..1be2a5bcc7c4 100644 --- a/fs/file.c +++ b/fs/file.c @@ -268,7 +268,9 @@ static inline void __set_open_fd(unsigned int fd, struct fdtable *fdt) static inline void __clear_open_fd(unsigned int fd, struct fdtable *fdt) { __clear_bit(fd, fdt->open_fds); - __clear_bit(fd / BITS_PER_LONG, fdt->full_fds_bits); + fd /= BITS_PER_LONG; + if (test_bit(fd, fdt->full_fds_bits)) + __clear_bit(fd, fdt->full_fds_bits); } static inline bool fd_is_open(unsigned int fd, const struct fdtable *fdt) From patchwork Sat Jul 13 02:39:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ma, Yu" X-Patchwork-Id: 13732284 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA63F29401; Sat, 13 Jul 2024 02:13:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836819; cv=none; b=K5gEB5lOcyHGPyKyeO5I3qk+bVPEVMg8O17MHEwa1zW3EMQTQkWJa/QSR0VvC1zYBluCPsuu1Z4f5oQvTxreQB9VYWshCEigi9z9cs6/FfsSehRoUxgW7v7rpWWX8/2IPzEg5M7s6EbDkTo0nB8jXIa8d2qxA+NufgnZFKEvosM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836819; c=relaxed/simple; bh=2Em9aXUK4hj994y2DrlM4htA/Mn72IIDxtASYHeCNb4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PJHHo7rHsfK7b7FEP+HfmPLiwlfcPY3hLXG5z8Nh+94tWFCHJsBwY+gW+TAhJvTL46DRRs4kX4uHo9MsDxV/mDUvBfxlnu9twGDT6cDVCAyr5rApXnjSkTfaSGebOT5jD5ONO/wJNTWXEGYKqW1HW7CBag04k74C3OIpyKCYpPI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=I13Twe6b; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="I13Twe6b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720836818; x=1752372818; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2Em9aXUK4hj994y2DrlM4htA/Mn72IIDxtASYHeCNb4=; b=I13Twe6bliP/N1tsoZmV0HhkwLJrismCYuvOWxzry+fmF1Q8GiDXLQHU 3xBegFQoKDHwV86wdgc/t3X6Eoh4WhpOBmGw0omtj5LY9DttqxtiwjXSF Ho3aVeuYzInytoDRbrP2hijp3ruZbVFsdchMye+bIRMoo+Cwvky/loXVW iLtbLB/G18kRSwnttotXSWo5VHmqQBXscUGgK0aVmx5kDxBJJw9k5U78d 4dS7xDDSI+CiHIUbml1TEmOlJve9mM8HohgeLFJASs0oBE+UEBbO+PiUt WzlUHEBiaxkIdokxU/bQQ9TsNVDGSKnAUtv0GY1YTesiup5i6L45F8rFm w==; X-CSE-ConnectionGUID: mdm38AozTt2ruVqvQw6bgg== X-CSE-MsgGUID: SavpnTgUTeiB3G46bIwxqA== X-IronPort-AV: E=McAfee;i="6700,10204,11131"; a="12531282" X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="12531282" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 19:13:37 -0700 X-CSE-ConnectionGUID: eMkj99wjQe2+EhROswhplQ== X-CSE-MsgGUID: RMdm8jjqShaZtcmnZ6/B0Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="53449935" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa005.fm.intel.com with ESMTP; 12 Jul 2024 19:13:35 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v4 3/3] fs/file.c: add fast path in find_next_fd() Date: Fri, 12 Jul 2024 22:39:17 -0400 Message-ID: <20240713023917.3967269-4-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240713023917.3967269-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240713023917.3967269-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Skip 2-levels searching via find_next_zero_bit() when there is free slot in the word contains next_fd, as: (1) next_fd indicates the lower bound for the first free fd. (2) There is fast path inside of find_next_zero_bit() when size<=64 to speed up searching. (3) After fdt is expanded (the bitmap size doubled for each time of expansion), it would never be shrunk. The search size increases but there are few open fds available here. This fast path is proposed by Mateusz Guzik , and agreed by Jan Kara , which is more generic and scalable than previous versions. And on top of patch 1 and 2, it improves pts/blogbench-1.1.0 read by 8% and write by 4% on Intel ICX 160 cores configuration with v6.10-rc7. Reviewed-by: Tim Chen Signed-off-by: Yu Ma Reviewed-by: Jan Kara --- fs/file.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index 1be2a5bcc7c4..a3ce6ba30c8c 100644 --- a/fs/file.c +++ b/fs/file.c @@ -488,9 +488,20 @@ struct files_struct init_files = { static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) { + unsigned int bitbit = start / BITS_PER_LONG; + unsigned int bit; + + /* + * Try to avoid looking at the second level bitmap + */ + bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG, + start & (BITS_PER_LONG -1)); + if (bit < BITS_PER_LONG) { + return bit + bitbit * BITS_PER_LONG; + } + unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */ unsigned int maxbit = maxfd / BITS_PER_LONG; - unsigned int bitbit = start / BITS_PER_LONG; bitbit = find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) * BITS_PER_LONG; if (bitbit >= maxfd)