From patchwork Tue Oct 22 20:39:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13846158 Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FE643EA83 for ; Tue, 22 Oct 2024 20:47:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729630037; cv=none; b=Bl8Bnzq33idAbrvoWwybhLXAbwF/IWa46kq7U9WV6oJvrK2o0/81xPUDUnhSCttj2OtcN7hCl0Wfbl6ZnQpZ8SU7bXFY+XxSpuHtKfzyOGxQL9MIhUza3hiV1ykIBP8f/p0LVSpbr341DL8vY3uBYl3f4l6ZBgjFYupPVg8vU3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729630037; c=relaxed/simple; bh=IaJHGPEnG4FwAl81uj6oQFgeVUEh5ilo+0SBvR1Qvsw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Jv/0umEH+5jyz+gd8NK54mQnU8YcIcNgVI0fi9agQx95vSWBSP86hZ8rXhQwXwXLSUyeR356zWA7CwqHKNz9jLswKKcsHuDoqpE9ecIRGaFiFfWLJMC4YiiTwMfirkgAKMxYtU1OnJLcdPnoqHa4mawQdN+zDz7WOz33GG8Xym4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=V+Hbs5sb; arc=none smtp.client-ip=209.85.166.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="V+Hbs5sb" Received: by mail-io1-f51.google.com with SMTP id ca18e2360f4ac-83aad4a05eeso243832239f.1 for ; Tue, 22 Oct 2024 13:47:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729630034; x=1730234834; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=877Civ4zwX5SMS6lpgNa8Hw02O/Hug6rwORSlmT0bwA=; b=V+Hbs5sbR9KqUK4wKqUyxVGnZKwtUpkE+TIlkU+GrY9yuxfrfKPzWbkCsKTOZ5/P9H PlGGsF0kcQASleAjNXjrM/o0BXbmyVmCdFtarnYd50GVI8LPXx9eZ+KB0qttzfaO0CGT sOn6C4EYBRmmtLdGOhSwGF7OSjDXnA7cR6/d5K3ZtGiRQ+W+PCmCfrjBwKd0WqwlvS1v EOCILEj2ygWL/QgA48fp12m8xx6xkOYpPT/2ISH2TQfFbjxXqHpS1/IaWZo2ywb2Vlbk 9gpwwIaCmS2APBpv/O9SXUVg3tbAQdElXowt6wNEWKGX3k2t4RA2l6VjBftPuyMyjU3k bgJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729630034; x=1730234834; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=877Civ4zwX5SMS6lpgNa8Hw02O/Hug6rwORSlmT0bwA=; b=sCVGaVO8U+dJ2+DSaOoUT0UFGE7myy1PZUPvFpWBKEZWDVrRk1VTVh7rvRtQBevfh+ ajmv1ZCiEfrl1lpbLOnfyYuuM5wbHzSklgDfIf0d5Aph7usuIrNO/6kTr5TICs8Ccnxv Y+IQDigbOHCqhvZBmO9hnEZriSh86H8lUvr5ChOrZD9BMWgSbrixjJM3TdR3dE1uDFhG 5u531M63NIo3OizPjcLkvw5dYZRmhSV27POs7McUzeGq1WaVfSe1VSG4oKNClLot4J/4 xh9z1sA07dbAZoOD8StRG8k0EnIjT8ertEKVCFVOiFbhkl5+FYpLpgUdl8weybBS/ee4 LsPw== X-Gm-Message-State: AOJu0Yx1LagUtxQgKPt1i3GV4MkQJLj151fPKVWMNGdlhmW/NVhMTbhL 9WnP8S1zRRgOCoLxNM89NJULgfX8LaTwNxLSfQpGOnXk5DzAz39qqQIiGIr5ciAxiArW/SwsI/n q X-Google-Smtp-Source: AGHT+IFbwRoUwywR1bMhfw/q0f0mZiUrHjDoTQq9WWJq2f+TszM/qxRFu5wa72Ahdu2HG84YdBS+Tw== X-Received: by 2002:a05:6602:2c94:b0:82d:16fa:52dd with SMTP id ca18e2360f4ac-83af6192782mr38088639f.7.1729630033721; Tue, 22 Oct 2024 13:47:13 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4dc2a5571d1sm1697385173.52.2024.10.22.13.47.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2024 13:47:12 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/3] io_uring: switch struct ext_arg from __kernel_timespec to timespec64 Date: Tue, 22 Oct 2024 14:39:02 -0600 Message-ID: <20241022204708.1025470-2-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241022204708.1025470-1-axboe@kernel.dk> References: <20241022204708.1025470-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This avoids intermediate storage for turning a __kernel_timespec user pointer into an on-stack struct timespec64, only then to turn it into a ktime_t. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index b5974bdad48b..8952453ea807 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2494,9 +2494,10 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct ext_arg { size_t argsz; - struct __kernel_timespec __user *ts; + struct timespec64 ts; const sigset_t __user *sig; ktime_t min_time; + bool ts_set; }; /* @@ -2534,13 +2535,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags, iowq.timeout = KTIME_MAX; start_time = io_get_time(ctx); - if (ext_arg->ts) { - struct timespec64 ts; - - if (get_timespec64(&ts, ext_arg->ts)) - return -EFAULT; - - iowq.timeout = timespec64_to_ktime(ts); + if (ext_arg->ts_set) { + iowq.timeout = timespec64_to_ktime(ext_arg->ts); if (!(flags & IORING_ENTER_ABS_TIMER)) iowq.timeout = ktime_add(iowq.timeout, start_time); } @@ -3251,7 +3247,6 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, */ if (!(flags & IORING_ENTER_EXT_ARG)) { ext_arg->sig = (const sigset_t __user *) argp; - ext_arg->ts = NULL; return 0; } @@ -3266,7 +3261,11 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, ext_arg->min_time = arg.min_wait_usec * NSEC_PER_USEC; ext_arg->sig = u64_to_user_ptr(arg.sigmask); ext_arg->argsz = arg.sigmask_sz; - ext_arg->ts = u64_to_user_ptr(arg.ts); + if (arg.ts) { + if (get_timespec64(&ext_arg->ts, u64_to_user_ptr(arg.ts))) + return -EFAULT; + ext_arg->ts_set = true; + } return 0; } From patchwork Tue Oct 22 20:39:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13846159 Received: from mail-io1-f49.google.com (mail-io1-f49.google.com [209.85.166.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5F011CCB26 for ; Tue, 22 Oct 2024 20:47:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729630039; cv=none; b=if479KgoL3mLOqpCllJdOpPQgYdHDGkHX6dmuhMYTP86ZSSeS4zBRtR7Zp43e4bOSWoeTcCfvXl2fmD+f1zTy225yUxtZ68M7oVSl5t5HXCBe7sbTDRGXxiqtuysdepO5nK/k5WKc0QjX7vxgb+YEeD8r8Az0E6WCB+2mbakkF4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729630039; c=relaxed/simple; bh=GZFcnp8e6oKtOWGxhWNVOdeyzjAZyESeFuthurBmm6Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rEDDl9rBUe1zY4CGBQdtXzEmuDP7qiszb8MObV585JQ360mRJm1W+iYcQ43PPzaTpj8S9tX1GEx8ZjMVfThiLz0hy9qDsgGKEitGOM4n8kXcS9Dzq8meHXaoBlz3EEYJ2c3SWEtVQzW0FUN05fOwzwNOCWY7MBcYnRucCuTj7/Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=eZUPPRhH; arc=none smtp.client-ip=209.85.166.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="eZUPPRhH" Received: by mail-io1-f49.google.com with SMTP id ca18e2360f4ac-83a9be2c0e6so256020639f.2 for ; Tue, 22 Oct 2024 13:47:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729630035; x=1730234835; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rnB3Z5RN4D4nRt88SitGDzm8oQVB+v3nAzJn2IwfJqg=; b=eZUPPRhHy+RF59yjSrF5WMdWFyJAJgvpXTDkY9pvDobW8gCfqHEtU1Fdus+AT2kpmK uifpBTf7u5MPAOLCL6EUnf6Fdk/lm3gVq8xwRDAmczR7q3aM+WsjZHMUzrGu5KgKo8IN oOlX4YuVNX6gCx6LHlV455fwQvoLez9fvRO5mWP8/FTjtoxdvmN/OLJxzh6eh5HcrVhH 4OZLeK+rd117pxICSnKOpOp9/C43QDKNPJn+roMULbotuo9aXYWjxlv8kovW2K+t+tkD ItRo4tRNsS3awk/oF0zPrr2g9+fhtvni+mpr98lQzI40KmHruMiV0dSLH7y9ugPrrpId ZMyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729630035; x=1730234835; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rnB3Z5RN4D4nRt88SitGDzm8oQVB+v3nAzJn2IwfJqg=; b=U/09lSK5srzUmI4gODF+DGBKhHhFdQXL5xYGqFrb4eL2TQWd4K47OLUKXlpsuMjB/x LReEAn9A4Pz3rpJdulxy6jZIApFJ413BChk0Opop4jK+CYx7jjotomsG+ZqFwAfbHNAT eFo1tOUW4uJvnmiUb5D0ws5aGKY6IkbMz/TQXW+U88CFZHVlH+daroE1DRYd1mmzc2Ja 8NYNcg0+D0d5yhcXrosYuys/73qEHeWR8sQK+7ARBDLuu+DhFDkzyuGBBMxTcZBY1tYL XWcnh+tBHQj3MF0dVdou0By8bAExjsUeve8rOUNOCdOB1tdRH5H2IVnH0tKCKzNJq6uf Bj4g== X-Gm-Message-State: AOJu0Yzwi4rJSDql3Aww453/0IvYxCErsFL4aMZdEi13GAGBmkQbtO1s 8dXy/FdTL9jEXpsFPZa7+TX09M4IfsxIV3X7ayrvzVkhfrqlu1SIvvJmfXOZo2mZ23v27INRxfF a X-Google-Smtp-Source: AGHT+IENdFegKYnMxsMfZcAq79NasfRpIugE6qYkSA+aCx8GxhHw2Su9GdI8VLUXa3ljDygi8L/j0Q== X-Received: by 2002:a05:6602:27c6:b0:83a:b0e7:cd5f with SMTP id ca18e2360f4ac-83af61633a1mr57829939f.6.1729630035445; Tue, 22 Oct 2024 13:47:15 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4dc2a5571d1sm1697385173.52.2024.10.22.13.47.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2024 13:47:14 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 2/3] io_uring: change io_get_ext_arg() to uaccess begin + end Date: Tue, 22 Oct 2024 14:39:03 -0600 Message-ID: <20241022204708.1025470-3-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241022204708.1025470-1-axboe@kernel.dk> References: <20241022204708.1025470-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In scenarios where a high frequency of wait events are seen, the copy of the struct io_uring_getevents_arg is quite noticeable in the profiles in terms of time spent. It can be seen as up to 3.5-4.5%. Rewrite the copy-in logic, saving about 0.5% of the time. Signed-off-by: Jens Axboe Reviewed-by: Keith Busch --- io_uring/io_uring.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 8952453ea807..612e7d66f845 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3239,6 +3239,7 @@ static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t a static int io_get_ext_arg(unsigned flags, const void __user *argp, struct ext_arg *ext_arg) { + const struct io_uring_getevents_arg __user *uarg = argp; struct io_uring_getevents_arg arg; /* @@ -3256,8 +3257,13 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, */ if (ext_arg->argsz != sizeof(arg)) return -EINVAL; - if (copy_from_user(&arg, argp, sizeof(arg))) + if (!user_access_begin(uarg, sizeof(*uarg))) return -EFAULT; + unsafe_get_user(arg.sigmask, &uarg->sigmask, uaccess_end); + unsafe_get_user(arg.min_wait_usec, &uarg->min_wait_usec, uaccess_end); + unsafe_get_user(arg.ts, &uarg->ts, uaccess_end); + unsafe_get_user(arg.sigmask_sz, &uarg->sigmask_sz, uaccess_end); + user_access_end(); ext_arg->min_time = arg.min_wait_usec * NSEC_PER_USEC; ext_arg->sig = u64_to_user_ptr(arg.sigmask); ext_arg->argsz = arg.sigmask_sz; @@ -3267,6 +3273,9 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, ext_arg->ts_set = true; } return 0; +uaccess_end: + user_access_end(); + return -EFAULT; } SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, From patchwork Tue Oct 22 20:39:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13846160 Received: from mail-il1-f182.google.com (mail-il1-f182.google.com [209.85.166.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F03D1CCECB for ; Tue, 22 Oct 2024 20:47:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729630040; cv=none; b=D/zOEbrVW3Ddhm/SRchobP4LtBtwtP9c4DFARORSYp9EWvFRT9BgzYD2tyHA2Gih8T5e7a2cFYH3qp7JCpS+J8EW8iCnAe/3tmv7DhNQG4cvd24+Rj42NzlFTPkBS7tXwUFyAGdf0kqzPRY2rwFmbv1NAYN0KlRtPXygcix8+M0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729630040; c=relaxed/simple; bh=WGbfdY+8v6niBkG/JBwzSRcDF5KKyq23U0vaS5PS2EI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A4ATsuGWUPlsIp5eBYMZQkCsHgzAUSk/XgRLj7i6VUMAX9nEDwinkxvnvddQn+kWOlyCsU6daQvcR8uUpqMDw5kFcWRp3moBbrqMgxGievFwDZW5JoqXSAEYMN0pyFqtCqr5qSbMRMelGEEo2YIoZNj3MNrQpigthIZeTGNvnC4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=tYsiZIIg; arc=none smtp.client-ip=209.85.166.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="tYsiZIIg" Received: by mail-il1-f182.google.com with SMTP id e9e14a558f8ab-3a3a3075af2so24184875ab.3 for ; Tue, 22 Oct 2024 13:47:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729630037; x=1730234837; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=REcxKj0HFYz/TQHkjuvtOnhI4V/jweJNyX3uiZ0WyP4=; b=tYsiZIIg5uPRoa7Tf6q4J2MyaLQCM1VU4kXzYGu4kY2M6W42ykr41pvfjt4bvi6DgP cLagmaqUpwwckoWbMbsr3BnmM2NgmMj2qqAX3Z68kdYnDsZx4kp5iVaxSbcE8V8geiGM 5iOpqrxrCs6YUvf+m84QUer6uFAHhgJDIZ66bDMl+D1DAa2lDhGsqSiR8DOddUakeBzD W7heSBYl6hUGMYxSfnXxNLPBuBfJIDY5xlInTGbSVt7/b1Mpbto7TCmNnjFp+PLPcdNF QM7act7xIp+6q3LcBUbIiMvz+0DVxhuE51F+v38SDzYr6fdr9w2nTOlLUVBnzNQdEk3c qPmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729630037; x=1730234837; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=REcxKj0HFYz/TQHkjuvtOnhI4V/jweJNyX3uiZ0WyP4=; b=Cf7ldF89asxryRqWtApsdwpTwiJG7dVzNUCSy8zz7ux5wEecZ9s9tF7Xxioplu53AR RHrzI8UAG0MvACqy+BNyNaRJZEMNRnj/odSpwCdmuk6Hbo9wibXEiWrGm64cMFyCBWv7 UdO/IVa30fyTb2LlbxsY7EhIldjk2Ac1GXIb39byCJFAjlFv8m/JBp24ivLSpuEpn4Mb uvrM0VaHJ61wwcl4r45XXolTfhXyausHP/Pf3zhLmCEtdMUwJmGcG5+cZiKGHbfSy8xg qi7/Ky7AVjQQ4a8S475E6HKpZYKXBRtUrjq1uSMhdnc7NFCfrkT3R5S93pF9222aXSIb gHdQ== X-Gm-Message-State: AOJu0YzD/zzYWD/D4ucq7Q8V1lMdpSRm69dCZEEt/MSKksDBnCRAH8V6 Z0MI6Fz7zst53Hz4X6WrO08jvUF4RqgsePo2QOJkByAIyiQtT/qEfLDb5J2uW7WdiypVVKR5ZZO n X-Google-Smtp-Source: AGHT+IGsKXqrSLG2+/LkNoceGH5nfPxgagnSQC75nMFbJYMcUsWQvcnRqmbn+WZtHxo4Kfxy8rDu3Q== X-Received: by 2002:a05:6e02:2184:b0:3a0:92b1:ec4c with SMTP id e9e14a558f8ab-3a4d59ea678mr4610825ab.23.1729630036776; Tue, 22 Oct 2024 13:47:16 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4dc2a5571d1sm1697385173.52.2024.10.22.13.47.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2024 13:47:15 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/3] io_uring: add support for fixed wait regions Date: Tue, 22 Oct 2024 14:39:04 -0600 Message-ID: <20241022204708.1025470-4-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241022204708.1025470-1-axboe@kernel.dk> References: <20241022204708.1025470-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Generally applications have 1 or a few waits of waiting, yet they pass in a struct io_uring_getevents_arg every time. This needs to get copied and, in turn, the timeout value needs to get copied. Rather than do this for every invocation, allow the application to register a fixed set of wait regions that can simply be indexed when asking the kernel to wait on events. At ring setup time, the application can register a number of these wait regions ala: struct io_uring_reg_wait *reg; posix_memalign((void **) ®, page_size, page_size); memset(reg, 0, page_size); /* set timeout and mark it as set, sigmask/sigmask_sz as needed */ reg->ts.tv_sec = 0; reg->ts.tv_nsec = 100000; reg->flags = IORING_REG_WAIT_TS; io_uring_register_cqwait_reg(ring, reg, nr_regions); and instead of doing: struct __kernel_timespec timeout = { .tv_nsec = 100000, }; io_uring_submit_and_wait_timeout(ring, &cqe, nr, &t, NULL); for each submit_and_wait, or just wait, operation, it can just reference the above region at offset 0 and do: io_uring_submit_and_wait_reg(ring, &cqe, nr, 0); to achieve the same goal of waiting 100usec without needing to copy both struct io_uring_getevents_arg (24b) and struct __kernel_timeout (16b) for each invocation. Struct io_uring_reg_wait looks as follows: struct io_uring_reg_wait { struct __kernel_timespec ts; __u32 min_wait_usec; __u32 flags; __u64 sigmask; __u32 sigmask_sz; __u32 pad[3]; __u64 pad2[2]; }; embedding the timeout itself in the region, rather than passing it as a pointer as well. Note that the signal mask is still passed as a pointer, both for compatability reasons, but also because there doesn't seem to be a lot of high frequency waits scenarios that involve setting and resetting the signal mask for each wait. The application is free to modify any region before a wait call, or it can use keep multiple regions with different settings to avoid needing to modify the same one for wait calls. Up to a page size of regions is mapped by default, allowing PAGE_SIZE / 64 available regions for use. In network performance testing with zero-copy, this reduced the time spent waiting on the TX side from 3.12% to 0.3% and the RX side from 4.4% to 0.3%. Wait regions are fixed for the lifetime of the ring - once registered, they are persistent until the ring is torn down. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 7 ++++ include/uapi/linux/io_uring.h | 18 +++++++++ io_uring/io_uring.c | 72 ++++++++++++++++++++++++++++------ io_uring/register.c | 48 +++++++++++++++++++++++ 4 files changed, 134 insertions(+), 11 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 6d3ee71bd832..40dc1ec37a42 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -327,6 +327,13 @@ struct io_ring_ctx { atomic_t cq_wait_nr; atomic_t cq_timeouts; struct wait_queue_head cq_wait; + + /* + * If registered with IORING_REGISTER_CQWAIT_REG, a single + * page holds N entries, mapped in cq_wait_arg. + */ + struct page **cq_wait_page; + struct io_uring_reg_wait *cq_wait_arg; } ____cacheline_aligned_in_smp; /* timeouts */ diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index c4737892c7cd..4ead32fa9275 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -518,6 +518,7 @@ struct io_cqring_offsets { #define IORING_ENTER_EXT_ARG (1U << 3) #define IORING_ENTER_REGISTERED_RING (1U << 4) #define IORING_ENTER_ABS_TIMER (1U << 5) +#define IORING_ENTER_EXT_ARG_REG (1U << 6) /* * Passed in for io_uring_setup(2). Copied back with updated info on success @@ -618,6 +619,9 @@ enum io_uring_register_op { /* resize CQ ring */ IORING_REGISTER_RESIZE_RINGS = 33, + /* register fixed io_uring_reg_wait arguments */ + IORING_REGISTER_CQWAIT_REG = 34, + /* this goes last */ IORING_REGISTER_LAST, @@ -801,6 +805,20 @@ enum io_uring_register_restriction_op { IORING_RESTRICTION_LAST }; +enum { + IORING_REG_WAIT_TS = (1U << 0), +}; + +struct io_uring_reg_wait { + struct __kernel_timespec ts; + __u32 min_wait_usec; + __u32 flags; + __u64 sigmask; + __u32 sigmask_sz; + __u32 pad[3]; + __u64 pad2[2]; +}; + struct io_uring_getevents_arg { __u64 sigmask; __u32 sigmask_sz; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 612e7d66f845..0b76b4becda9 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2735,6 +2735,10 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) io_alloc_cache_free(&ctx->msg_cache, io_msg_cache_free); io_futex_cache_free(ctx); io_destroy_buffers(ctx); + if (ctx->cq_wait_page) { + unsigned short npages = 1; + io_pages_unmap(ctx->cq_wait_arg, &ctx->cq_wait_page, &npages, true); + } mutex_unlock(&ctx->uring_lock); if (ctx->sq_creds) put_cred(ctx->sq_creds); @@ -3223,21 +3227,44 @@ void __io_uring_cancel(bool cancel_all) io_uring_cancel_generic(cancel_all, NULL); } -static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t argsz) +static struct io_uring_reg_wait *io_get_ext_arg_fixed(struct io_ring_ctx *ctx, + const struct io_uring_getevents_arg __user *uarg) +{ + struct io_uring_reg_wait *arg = READ_ONCE(ctx->cq_wait_arg); + + if (arg) { + unsigned int index = (unsigned int) (uintptr_t) uarg; + + if (index >= PAGE_SIZE / sizeof(struct io_uring_reg_wait)) + return ERR_PTR(-EINVAL); + return arg + index; + } + + return ERR_PTR(-EFAULT); +} + +static int io_validate_ext_arg(struct io_ring_ctx *ctx, unsigned flags, + const void __user *argp, size_t argsz) { - if (flags & IORING_ENTER_EXT_ARG) { - struct io_uring_getevents_arg arg; + struct io_uring_getevents_arg arg; - if (argsz != sizeof(arg)) + if (!(flags & IORING_ENTER_EXT_ARG)) + return 0; + + if (flags & IORING_ENTER_EXT_ARG_REG) { + if (argsz != sizeof(struct io_uring_reg_wait)) return -EINVAL; - if (copy_from_user(&arg, argp, sizeof(arg))) - return -EFAULT; + return PTR_ERR(io_get_ext_arg_fixed(ctx, argp)); } + if (argsz != sizeof(arg)) + return -EINVAL; + if (copy_from_user(&arg, argp, sizeof(arg))) + return -EFAULT; return 0; } -static int io_get_ext_arg(unsigned flags, const void __user *argp, - struct ext_arg *ext_arg) +static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned flags, + const void __user *argp, struct ext_arg *ext_arg) { const struct io_uring_getevents_arg __user *uarg = argp; struct io_uring_getevents_arg arg; @@ -3251,6 +3278,28 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, return 0; } + if (flags & IORING_ENTER_EXT_ARG_REG) { + struct io_uring_reg_wait *w; + + if (ext_arg->argsz != sizeof(struct io_uring_reg_wait)) + return -EINVAL; + w = io_get_ext_arg_fixed(ctx, argp); + if (IS_ERR(w)) + return PTR_ERR(w); + + if (w->flags & ~IORING_REG_WAIT_TS) + return -EINVAL; + ext_arg->min_time = READ_ONCE(w->min_wait_usec) * NSEC_PER_USEC; + ext_arg->sig = u64_to_user_ptr(READ_ONCE(w->sigmask)); + ext_arg->argsz = READ_ONCE(w->sigmask_sz); + if (w->flags & IORING_REG_WAIT_TS) { + ext_arg->ts.tv_sec = READ_ONCE(w->ts.tv_sec); + ext_arg->ts.tv_nsec = READ_ONCE(w->ts.tv_nsec); + ext_arg->ts_set = true; + } + return 0; + } + /* * EXT_ARG is set - ensure we agree on the size of it and copy in our * timespec and sigset_t pointers if good. @@ -3289,7 +3338,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP | IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG | IORING_ENTER_REGISTERED_RING | - IORING_ENTER_ABS_TIMER))) + IORING_ENTER_ABS_TIMER | + IORING_ENTER_EXT_ARG_REG))) return -EINVAL; /* @@ -3372,7 +3422,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, */ mutex_lock(&ctx->uring_lock); iopoll_locked: - ret2 = io_validate_ext_arg(flags, argp, argsz); + ret2 = io_validate_ext_arg(ctx, flags, argp, argsz); if (likely(!ret2)) { min_complete = min(min_complete, ctx->cq_entries); @@ -3382,7 +3432,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, } else { struct ext_arg ext_arg = { .argsz = argsz }; - ret2 = io_get_ext_arg(flags, argp, &ext_arg); + ret2 = io_get_ext_arg(ctx, flags, argp, &ext_arg); if (likely(!ret2)) { min_complete = min(min_complete, ctx->cq_entries); diff --git a/io_uring/register.c b/io_uring/register.c index 8fbce6f268b6..edf6c218b228 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -520,6 +520,48 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg) return 0; } +/* + * Register a page holding N entries of struct io_uring_reg_wait, which can + * be used via io_uring_enter(2) if IORING_GETEVENTS_EXT_ARG_REG is set. + * If that is set with IORING_GETEVENTS_EXT_ARG, then instead of passing + * in a pointer for a struct io_uring_getevents_arg, an index into this + * registered array is passed, avoiding two (arg + timeout) copies per + * invocation. + */ +static int io_register_cqwait_reg(struct io_ring_ctx *ctx, void __user *uarg, + unsigned nr_args) +{ + struct io_uring_reg_wait *arg; + struct page **pages; + unsigned long len; + int nr_pages; + + if (ctx->cq_wait_page || ctx->cq_wait_arg) + return -EBUSY; + if (check_mul_overflow(sizeof(*arg), nr_args, &len)) + return -EOVERFLOW; + if (len > PAGE_SIZE) + return -EINVAL; + + pages = io_pin_pages((unsigned long) uarg, len, &nr_pages); + if (IS_ERR(pages)) + return PTR_ERR(pages); + if (nr_pages != 1) { + io_pages_free(&pages, nr_pages); + return -EINVAL; + } + + arg = vmap(pages, 1, VM_MAP, PAGE_KERNEL); + if (arg) { + WRITE_ONCE(ctx->cq_wait_page, pages); + WRITE_ONCE(ctx->cq_wait_arg, arg); + return 0; + } + + io_pages_free(&pages, 1); + return -ENOMEM; +} + static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, void __user *arg, unsigned nr_args) __releases(ctx->uring_lock) @@ -714,6 +756,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_resize_rings(ctx, arg); break; + case IORING_REGISTER_CQWAIT_REG: + ret = -EINVAL; + if (!arg) + break; + ret = io_register_cqwait_reg(ctx, arg, nr_args); + break; default: ret = -EINVAL; break;