From patchwork Mon Sep 13 20:01:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490707 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-21.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 182F5C433EF for ; Mon, 13 Sep 2021 20:04:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 00D9661216 for ; Mon, 13 Sep 2021 20:04:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347798AbhIMUFr (ORCPT ); Mon, 13 Sep 2021 16:05:47 -0400 Received: from mga05.intel.com ([192.55.52.43]:38689 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241346AbhIMUFq (ORCPT ); Mon, 13 Sep 2021 16:05:46 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336355" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336355" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643902" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:29 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 01/13] x86/uintr/man-page: Include man pages draft for reference Date: Mon, 13 Sep 2021 13:01:20 -0700 Message-Id: <20210913200132.3396598-2-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Included here in plain text format for reference and review. The formatting for the man pages still needs a little bit of work. Signed-off-by: Sohil Mehta --- tools/uintr/manpages/0_overview.txt | 265 ++++++++++++++++++ tools/uintr/manpages/1_register_receiver.txt | 122 ++++++++ .../uintr/manpages/2_unregister_receiver.txt | 62 ++++ tools/uintr/manpages/3_create_fd.txt | 104 +++++++ tools/uintr/manpages/4_register_sender.txt | 121 ++++++++ tools/uintr/manpages/5_unregister_sender.txt | 79 ++++++ tools/uintr/manpages/6_wait.txt | 59 ++++ 7 files changed, 812 insertions(+) create mode 100644 tools/uintr/manpages/0_overview.txt create mode 100644 tools/uintr/manpages/1_register_receiver.txt create mode 100644 tools/uintr/manpages/2_unregister_receiver.txt create mode 100644 tools/uintr/manpages/3_create_fd.txt create mode 100644 tools/uintr/manpages/4_register_sender.txt create mode 100644 tools/uintr/manpages/5_unregister_sender.txt create mode 100644 tools/uintr/manpages/6_wait.txt diff --git a/tools/uintr/manpages/0_overview.txt b/tools/uintr/manpages/0_overview.txt new file mode 100644 index 000000000000..349538effb15 --- /dev/null +++ b/tools/uintr/manpages/0_overview.txt @@ -0,0 +1,265 @@ +UINTR(7) Miscellaneous Information Manual UINTR(7) + + + +NAME + Uintr - overview of User Interrupts + +DESCRIPTION + User Interrupts (Uintr) provides a low latency event delivery and inter + process communication mechanism. These events can be delivered directly + to userspace without a transition to the kernel. + + In the User Interrupts hardware architecture, a receiver is always + expected to be a user space task. However, a user interrupt can be sent + by another user space task, kernel or an external source (like a + device). The feature that allows another userspace task to send an + interrupt is referred to as User IPI. + + Uintr is a hardware dependent, opt-in feature. Application aren't + expected or able to send or receive interrupts unless they register + themselves with the kernel using the syscall interface described below. + It is recommended that applications wanting to use User Interrupts call + uintr_register_handler(2) and test whether the call succeeds. + + Hardware support for User interrupts may be detected using other + mechanisms but they could be misleading and are generally not needed: + - Using the cpuid instruction (Refer the Intel Software Developers + Manual). + - Checking for the "uintr" string in /proc/cpuinfo under the "flags" + field. + + + Applications wanting to use Uintr should also be able to function + without it as well. Uintr support might be unavailable because of any + one of the following reasons: + - the kernel code does not contain support + - the kernel support has been disabled + - the hardware does not support it + + + Uintr syscall interface + Applications can use and manage Uintr using the system calls described + here. The Uintr system calls are available only if the kernel is + configured with the CONFIG_X86_USER_INTERRUPTS option. + + 1) A user application registers an interrupt handler using + uintr_register_handler(2). The registered interrupt handler will be + invoked when a user interrupt is delivered to that thread. Only one + interrupt handler can be registered by a particular thread within a + process. + + 2) Each thread that registered a handler has its own unique vector + space of 64 vectors. The thread can then use uintr_create_fd(2) to + register a vector and create a user interrupt file descriptor - + uintr_fd. + + 3) The uintr_fd is only associated with only one Uintr vector. A new + uintr_fd must be created for each of the 64 vectors. uintr_fd is + automatically inherited by forked processes but the receiver can also + share the uintr_fd with potential senders using any of the existing FD + sharing mechanisms (like pidfd_getfd(2) or socket sendmsg(2)). Access + to uintr_fd allows a sender to generate an interrupt with the + associated vector. Upon interrupt delivery, the interrupt handler is + invoked with the vector number pushed onto the stack to identify the + source of the interrupt. + + 4) Each thread has a local flag called User Interrupt flag (UIF). The + thread can set or clear this flag to enable or disable interrupts. The + default value of UIF is always 0 (Interrupts disabled). A receiver must + execute the _stui() intrinsic instruction at some point (before or + anytime after registration) to start receiving user interrupts. To + disable interrupts during critical sections the thread can call the + _clui() instruction to clear UIF. + + 5a) For sending a user IPI, the sender task registers with the kernel + using uintr_register_sender(2). The kernel would setup the routing + tables to connect the sender and receiver. The syscall returns an index + that can be used with the 'SENDUIPI ' instruction to send a user + IPI. If the receiver is running, the interrupt is delivered directly + to the receiver without any kernel intervention. + + 5b) If the sender is the kernel or an external source, the uintr_fd can + be passed onto the related kernel entity to allow them to connect and + generate the user interrupt. + + 6) The receiver can block in the kernel while it is waiting for user + interrupts to get delivered using uintr_wait(2). If the receiver has + been context switched out due to other reasons the user interrupt will + be delivered when the receiver gets scheduled back in. + + + + 7) The sender and receiver are expected to coordinate and then call the + teardown syscalls to terminate the connection: + a. A sender unregisters with uintr_unregister_sender(2) + b. A vector is unregistered using close(uintr_fd) + c. A receiver unregisters with uintr_unregister_handler(2) + + If the sender and receiver aren't able to coordinate, some shared + kernel resources between them would get freed later when the file + descriptors get released automatically on process exit. + + + Multi-threaded applications need to be careful when using Uintr since + it is a thread specific feature. Actions by one thread don't reflect on + other threads of the same application. + + + Toolchain support + Support has added to GCC(11.1) and Binutils(2.36.1) to enable user + interrupt intrinsic instructions and compiler flag (-muintr). + + The "(interrupt)" attribute can be used to compile a function as a user + interrupt handler. In conjunction with the 'muintr' flag, the compiler + would: + - Generate the entry and exit sequences for the User interrupt + handler + - Handle the saving and restoring of registers + - Call uiret to return from a user interrupt handler + + User Interrupts related compiler intrinsic instructions - + : + + _clui() - Disable user interrupts - clear UIF (User Interrupt Flag). + + _stui() - enable user interrupts - set UIF. + + _testui() - test current value of UIF. + + _uiret() - return from a user interrupt handler. + + _senduipi - send a user IPI to a target task. The + uipi_index is obtained using uintr_register_sender(2). + + + Interrupt handler restrictions + There are restrictions on what can be done in a user interrupt handler. + + For example, the handler and the functions called from the handler + should only use general purpose registers. + + For details refer the Uintr compiler programming guide. + https://github.com/intel/uintr-compiler-guide/blob/uintr- + gcc-11.1/UINTR-compiler-guide.pdf + + +CONFORMING TO + Uintr related system calls are Linux specific. + +EXAMPLES + Build + To compile this sample an updated toolchain is needed. + - Use GCC release 11 or higher & + - Use Binutils release 2.36 or higher + + gcc -muintr -mgeneral-regs-only -minline-all-stringops uipi_sample.c -lpthread -o uipi_sample + + + Run + $./uipi_sample + Receiver enabled interrupts + Sending IPI from sender thread + -- User Interrupt handler -- + Success + + + Program source + #define _GNU_SOURCE + #include + #include + #include + #include + #include + #include + + #define __NR_uintr_register_handler 449 + #define __NR_uintr_unregister_handler 450 + #define __NR_uintr_create_fd 451 + #define __NR_uintr_register_sender 452 + #define __NR_uintr_unregister_sender 453 + + #define uintr_register_handler(handler, flags) syscall(__NR_uintr_register_handler, handler, flags) + #define uintr_unregister_handler(flags) syscall(__NR_uintr_unregister_handler, flags) + #define uintr_create_fd(vector, flags) syscall(__NR_uintr_create_fd, vector, flags) + #define uintr_register_sender(fd, flags) syscall(__NR_uintr_register_sender, fd, flags) + #define uintr_unregister_sender(fd, flags) syscall(__NR_uintr_unregister_sender, fd, flags) + + unsigned int uintr_received; + unsigned int uintr_fd; + + void __attribute__ ((interrupt)) uintr_handler(struct __uintr_frame *ui_frame, + unsigned long long vector) + { + static const char print[] = "\t-- User Interrupt handler --\n"; + + write(STDOUT_FILENO, print, sizeof(print) - 1); + uintr_received = 1; + } + + void *sender_thread(void *arg) + { + int uipi_index; + + uipi_index = uintr_register_sender(uintr_fd, 0); + if (uipi_index < 0) { + printf("Sender register error\n"); + exit(EXIT_FAILURE); + } + + printf("Sending IPI from sender thread\n"); + _senduipi(uipi_index); + + uintr_unregister_sender(uintr_fd, 0); + + return NULL; + } + + int main(int argc, char *argv[]) + { + pthread_t pt; + + if (uintr_register_handler(uintr_handler, 0)) { + printf("Interrupt handler register error\n"); + exit(EXIT_FAILURE); + } + + uintr_fd = uintr_create_fd(0, 0); + if (uintr_fd < 0) { + printf("Interrupt vector registration error\n"); + exit(EXIT_FAILURE); + } + + _stui(); + printf("Receiver enabled interrupts\n"); + + if (pthread_create(&pt, NULL, &sender_thread, NULL)) { + printf("Error creating sender thread\n"); + exit(EXIT_FAILURE); + } + + /* Do some other work */ + while (!uintr_received) + usleep(1); + + pthread_join(pt, NULL); + close(uintr_fd); + uintr_unregister_handler(0); + + printf("Success\n"); + exit(EXIT_SUCCESS); + } + + +NOTES + Currently, there is no glibc wrapper for the Uintr related system call; + call the system calls using syscall(2). + + + + UINTR(7) diff --git a/tools/uintr/manpages/1_register_receiver.txt b/tools/uintr/manpages/1_register_receiver.txt new file mode 100644 index 000000000000..4b6652c94faa --- /dev/null +++ b/tools/uintr/manpages/1_register_receiver.txt @@ -0,0 +1,122 @@ +uintr_register_handler(2) System Calls Manual uintr_register_handler(2) + + + +NAME + uintr_register_handler - register a user interrupt handler + + +SYNOPSIS + int uintr_register_handler(u64 handler_address, unsigned int flags); + + +DESCRIPTION + uintr_register_handler() registers a user interrupt handler for the + calling process. In case of multi-threaded processes the user interrupt + handler is only registered for the thread that makes this system call. + + The handler_address is the function that would be invoked when the + process receives a user interrupt. The function should be defined as + below: + + void __attribute__ ((interrupt)) ui_handler(struct __uintr_frame *frame, + unsigned long long vector) + + For more details and an example for the handler definition refer + uintr(7). + + Providing an invalid handler_address could lead to undefined behavior + for the process. + + The flags argument is reserved for future use. Currently, it must be + specified as 0. + + Each user thread can register only one interrupt handler. Each thread + that would like to be a receiver must register once. The registration + is not inherited across forks(2) or when additional threads are created + within the same process. + + Each thread within a process gets its own interrupt vector space for 64 + vectors. The vector number is pushed onto the stack when a user + interrupt is delivered. Since the vector space is per-thread, each + receiver can receive up to 64 unique interrupt events. + + For information on creating uintr_fd to register and manage interrupt + vectors, refer uintr_create_fd(2) system call. + + Once an interrupt handler is registered it cannot be changed before the + handler is unregistered via uintr_unregister_handler(2). Calling + uintr_unregister_handler(2) would however invalidate the current + interrupt resources registered with the kernel. + + The interrupt handler gets invoked only while the process is running. + If the process is scheduled out or blocked in the kernel, interrupts + will be delivered when the process is scheduled again. + + + Interrupt handler restrictions + There are restrictions on what can be done in a user interrupt handler. + + For example, the handler and the functions called from the handler + should only use general purpose registers. + + For details refer the Uintr compiler programming guide. + https://github.com/intel/uintr-compiler-guide/blob/uintr- + gcc-11.1/UINTR-compiler-guide.pdf + + + Security implications + A lot of security issues that are applicable to signal handlers, also + apply to user interrupt handlers. + + The user interrupt handler by-itself need not be re-entrant since + interrupts are automatically disabled when the handler is invoked. But + this isn't valid if the handler is shared between multiple threads or + nested interrupts have been enabled. + + Similar to signal handlers, the functions that are called from a user + interrupt should be async-signal-safe. Refer signal-safety(7) for a + discussion of async-signal-safe functions. + + It is recommended to disable interrupts using _clui() instruction + before executing any privileged code. Doing so would prevent a user + interrupt handler from running at a higher privilege level. + + +RETURN VALUE + On success, uintr_register_handler() returns 0. On error, -1 is + returned and errno is set to indicate the cause of the error. + + +ERRORS + EOPNOTSUPP Underlying hardware doesn't have support for Uintr. + + EINVAL flags is not 0. + + EFAULT handler address is not valid. + + ENOMEM The system is out of available memory. + + EBUSY An interrupt handler has already been registered. + + +VERSIONS + uintr_register_handler() first appeared in Linux . + + +CONFORMING TO + uintr_register_handler() is Linux specific. + + +NOTES + Currently, there is no glibc wrapper for this system call; call it + using syscall(2). + + The user interrupt related system calls need hardware support to + generate and receive user interrupts. Refer uintr(7) for details. + + + + uintr_register_handler(2) diff --git a/tools/uintr/manpages/2_unregister_receiver.txt b/tools/uintr/manpages/2_unregister_receiver.txt new file mode 100644 index 000000000000..dd6981f33597 --- /dev/null +++ b/tools/uintr/manpages/2_unregister_receiver.txt @@ -0,0 +1,62 @@ +uintr_unregister_handler(2) System Calls Manual uintr_unregister_handler(2) + + + +NAME + uintr_unregister_handler - unregister a user interrupt handler + + +SYNOPSIS + int uintr_unregister_handler(unsigned int flags); + + +DESCRIPTION + uintr_unregister_handler() unregisters a previously registered user + interrupt handler. If no interrupt handler was registered by the + process uintr_unregister_handler() would return an error. + + Since interrupt handler is local to a thread, only the thread that has + registered via uintr_register_handler(2) can call + uintr_unregister_handler(). + + The interrupt resources such as interrupt vectors and uintr_fd, that + have been allocated for this thread, would be deactivated. Other + senders posting interrupts to this thread will not be delivered. + + The kernel does not automatically close the uintr_fds related to this + process/thread when uintr_unregister_handler() is called. The + application is expected to close the unused uintr_fds before or the + after the handler has been unregistered. + + +RETURN VALUE + On success, uintr_unregister_handler() returns 0. On error, -1 is + returned and errno is set to indicate the cause of the error. + + +ERRORS + EOPNOTSUPP Underlying hardware doesn't have support for Uintr. + + EINVAL flags is not 0. + + EINVAL No registered user interrupt handler. + + +VERSIONS + uintr_unregister_handler() first appeared in Linux . + + +CONFORMING TO + uintr_unregister_handler() is Linux specific. + + +NOTES + Currently, there is no glibc wrapper for this system call; call it + using syscall(2). + + The user interrupt related system calls need hardware support to + generate and receive user interrupts. Refer uintr(7) for details. + + + + uintr_unregister_handler(2) diff --git a/tools/uintr/manpages/3_create_fd.txt b/tools/uintr/manpages/3_create_fd.txt new file mode 100644 index 000000000000..e90b0dce2703 --- /dev/null +++ b/tools/uintr/manpages/3_create_fd.txt @@ -0,0 +1,104 @@ +uintr_create_fd(2) System Calls Manual uintr_create_fd(2) + + + +NAME + uintr_create_fd - Create a user interrupt file descriptor - uintr_fd + + +SYNOPSIS + int uintr_create_fd(u64 vector, unsigned int flags); + + +DESCRIPTION + uintr_create_fd() allocates a new user interrupt file descriptor + (uintr_fd) based on the vector registered by the calling process. The + uintr_fd can be shared with other processes and the kernel to allow + them to generate interrupts with the associated vector. + + The caller must have registered a handler via uintr_register_handler(2) + before attempting to create uintr_fd. The interrupts generated based on + this uintr_fd will be delivered only to the thread that created this + file descriptor. A unique uintr_fd is generated for each vector + registered using uintr_create_fd(). + + Each thread has a private vector space of 64 vectors ranging from 0-63. + Vector number 63 has the highest priority while vector number 0 has the + lowest. If two or more interrupts are pending to be delivered then the + interrupt with the higher vector number will be delivered first + followed by the ones with lower vector numbers. Applications can choose + appropriate vector numbers to prioritize certain interrupts over + others. + + Upon interrupt delivery, the handler is invoked with the vector number + pushed onto the stack to help identify the source of the interrupt. + Since the vector space is per-thread, each receiver can receive up to + 64 unique interrupt events. + + A receiver can choose to share the same uintr_fd with multiple senders. + Since an interrupt with the same vector number would be delivered, the + receiver would need to use other mechanisms to identify the exact + source of the interrupt. + + The flags argument is reserved for future use. Currently, it must be + specified as 0. + + close(2) + When the file descriptor is no longer required it should be + closed. When all file descriptors associated with the same + uintr_fd object have been closed, the resources for object are + freed by the kernel. + + fork(2) + A copy of the file descriptor created by uintr_create_fd() is + inherited by the child produced by fork(2). The duplicate file + descriptor is associated with the same uintr_fd object. The + close-on-exec flag (FD_CLOEXEC; see fcntl(2)) is set on the + file descriptor returned by uintr_create_fd(). + + For information on how to generate interrupts with uintr_fd refer + uintr_register_sender(2). + + +RETURN VALUE + On success, uintr_create_fd() returns a new uintr_fd file descriptor. + On error, -1 is returned and errno is set to indicate the cause of the + error. + + +ERRORS + EOPNOTSUPP Underlying hardware doesn't have support for Uintr. + + EINVAL flags is not 0. + + EFAULT handler address is not valid. + + EMFILE The per-process limit on the number of open file + descriptors has been reached. + + ENFILE The system-wide limit on the total number of open files + has been reached. + + ENODEV Could not mount (internal) anonymous inode device. + + ENOMEM The system is out of available memory to allocate uintr_fd. + + +VERSIONS + uintr_create_fd() first appeared in Linux . + + +CONFORMING TO + uintr_create_fd() is Linux specific. + + +NOTES + Currently, there is no glibc wrapper for this system call; call it + using syscall(2). + + The user interrupt related system calls need hardware support to + generate and receive user interrupts. Refer uintr(7) for details. + + + + uintr_create_fd(2) diff --git a/tools/uintr/manpages/4_register_sender.txt b/tools/uintr/manpages/4_register_sender.txt new file mode 100644 index 000000000000..1dc17f4c041f --- /dev/null +++ b/tools/uintr/manpages/4_register_sender.txt @@ -0,0 +1,121 @@ +uintr_register_sender(2) System Calls Manual uintr_register_sender(2) + + + +NAME + uintr_register_sender - Register a user inter-process interrupt sender + + +SYNOPSIS + int uintr_register_sender(int uintr_fd, unsigned int flags); + + +DESCRIPTION + uintr_register_sender() allows a sender process to connect with a Uintr + receiver based on the uintr_fd. It returns a user IPI index + (uipi_index) that the sender process can use in conjunction with the + SENDUIPI instruction to generate a user IPI. + + When a sender executes 'SENDUIPI ', a user IPI can be + delivered by the hardware to the receiver without any intervention from + the kernel. Upon IPI delivery, the handler is invoked with the vector + number, associated with uintr_fd, pushed onto the stack to help + identify the source of the interrupt. + + If the receiver for the thread is running the hardware would directly + deliver the user IPI to the receiver. If the receiver is not running or + has disabled receiving interrupts using the STUI instruction, the + interrupt will be stored in memory and delivered when the receiver is + able to receive it. + + If the sender tries to send multiple IPIs while the receiver is not + able to receive them then all the IPIs with the same vector would be + coalesced. Only a single IPI per vector would be delivered. + + uintr_register_sender() can be used to connect with multiple uintr_fds. + uintr_register_sender() would return a unique uipi_index for each + uintr_fd the sender connects with. + + In case of a multi-threaded process, the uipi_index is only valid for + the thread that registered itself. Other threads would need to register + themselves if they intend to be a user IPI sender. Executing SENDUIPI + on different threads can have varying results based on the connections + that have been setup. + + + + If a process uses SENDUIPI without registering using + uintr_register_sender() it receives a SIGILL signal. If a process uses + an illegal uipi_index, it receives a SIGSEGV signal. See sigaction(2) + for details of the information available with that signal. + + The flags argument is reserved for future use. Currently, it must be + specified as 0. + + close(2) + When the file descriptor is no longer required it should be + closed. When all file descriptors associated with the same + uintr_fd object have been closed, the resources for object are + freed by the kernel. Freeing the uintr_fd object would also + result in the associated uipi_index to be freed. + + fork(2) + A copy of uintr_fd is inherited by the child produced by + fork(2). However the uipi_index would not get inherited by the + child. If the child wants to send a user IPI it would have to + explicitly register itself using the uintr_register_sender() + system call. + + For information on how to unregister a sender refer + uintr_unregister_sender(2). + + +RETURN VALUE + On success, uintr_register_sender() returns a new user IPI index - + uipi_index. On error, -1 is returned and errno is set to indicate the + cause of the error. + + +ERRORS + EOPNOTSUPP Underlying hardware doesn't have support for uintr(7). + + EOPNOTSUPP uintr_fd does not refer to a Uintr instance. + + EBADF The uintr_fd passed to the kernel is invalid. + + EINVAL flags is not 0. + + EISCONN A connection to this uintr_fd has already been established. + + ECONNRESET The user interrupt receiver has disabled the connection. + + ESHUTDOWN The user interrupt receiver has exited the connection. + + ENOSPC No uipi_index can be allocated. The system has run out of + the available user IPI indexes. + + ENOMEM The system is out of available memory to register a user + IPI sender. + + +VERSIONS + uintr_register_sender() first appeared in Linux . + + +CONFORMING TO + uintr_register_sender() is Linux specific. + + +NOTES + Currently, there is no glibc wrapper for this system call; call it + using syscall(2). + + The user interrupt related system calls need hardware support to + generate and receive user interrupts. Refer uintr(7) for details. + + + + uintr_register_sender(2) diff --git a/tools/uintr/manpages/5_unregister_sender.txt b/tools/uintr/manpages/5_unregister_sender.txt new file mode 100644 index 000000000000..31a8c574dc25 --- /dev/null +++ b/tools/uintr/manpages/5_unregister_sender.txt @@ -0,0 +1,79 @@ +uintr_unregister_sender(2) System Calls Manual uintr_unregister_sender(2) + + + +NAME + uintr_unregister_sender - Unregister a user inter-process interrupt + sender + + +SYNOPSIS + int uintr_unregister_sender(int uintr_fd, unsigned int flags); + + +DESCRIPTION + uintr_unregister_sender() unregisters a sender process from a uintr_fd + it had previously connected with. If no connection is present with this + uintr_fd the system call return an error. + + The uipi_index that was allocated during uintr_register_sender(2) will + also be freed. If a process tries to use a uipi_index after it has been + freed it would receive a SIGSEGV signal. + + In case of a multi-threaded process uintr_unregister_sender() will only + disconnect the thread that makes this call. Other threads can continue + to use their connection with the uintr_fd based on their uipi_index. + + + + The flags argument is reserved for future use. Currently, it must be + specified as 0. + + close(2) + When the file descriptor is no longer required it should be + closed. When all file descriptors associated with the same + uintr_fd object have been closed, the resources for object are + freed by the kernel. Freeing the uintr_fd object would also + result in the associated uipi_index to be freed. + + The behavior of uintr_unregister_sender() system call after uintr_fd + has been close is undefined. + + +RETURN VALUE + On success, uintr_unregister_sender() returns 0. On error, -1 is + returned and errno is set to indicate the cause of the error. + + +ERRORS + EOPNOTSUPP Underlying hardware doesn't have support for uintr(7). + + EOPNOTSUPP uintr_fd does not refer to a Uintr instance. + + EBADF The uintr_fd passed to the kernel is invalid. + + EINVAL flags is not 0. + + EINVAL No connection has been setup with this uintr_fd. + + +VERSIONS + uintr_unregister_sender() first appeared in Linux . + + +CONFORMING TO + uintr_unregister_sender() is Linux specific. + + +NOTES + Currently, there is no glibc wrapper for this system call; call it + using syscall(2). + + The user interrupt related system calls need hardware support to + generate and receive user interrupts. Refer uintr(7) for details. + + + + uintr_unregister_sender(2) diff --git a/tools/uintr/manpages/6_wait.txt b/tools/uintr/manpages/6_wait.txt new file mode 100644 index 000000000000..f281a6ce83aa --- /dev/null +++ b/tools/uintr/manpages/6_wait.txt @@ -0,0 +1,59 @@ +uintr_wait(2) System Calls Manual uintr_wait(2) + + + +NAME + uintr_wait - wait for user interrupts + + +SYNOPSIS + int uintr_wait(unsigned int flags); + + +DESCRIPTION + uintr_wait() causes the calling process (or thread) to sleep until a + user interrupt is delivered. + + uintr_wait() will block in the kernel only when a interrupt handler has + been registered using uintr_register_handler(2) + + + + +RETURN VALUE + uintr_wait() returns only when a user interrupt is received and the + interrupt handler function returned. In this case, -1 is returned and + errno is set to EINTR. + + +ERRORS + EOPNOTSUPP Underlying hardware doesn't have support for Uintr. + + EOPNOTSUPP No interrupt handler registered. + + EINVAL flags is not 0. + + EINTR A user interrupt was received and the interrupt handler + returned. + + +VERSIONS + uintr_wait() first appeared in Linux . + + +CONFORMING TO + uintr_wait() is Linux specific. + + +NOTES + Currently, there is no glibc wrapper for this system call; call it + using syscall(2). + + The user interrupt related system calls need hardware support to + generate and receive user interrupts. Refer uintr(7) for details. + + + + uintr_wait(2) From patchwork Mon Sep 13 20:01:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D3F2C43219 for ; Mon, 13 Sep 2021 20:04:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 811DF61130 for ; Mon, 13 Sep 2021 20:04:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237491AbhIMUFt (ORCPT ); Mon, 13 Sep 2021 16:05:49 -0400 Received: from mga05.intel.com ([192.55.52.43]:38689 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347782AbhIMUFr (ORCPT ); Mon, 13 Sep 2021 16:05:47 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336358" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336358" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643906" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:29 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 02/13] Documentation/x86: Add documentation for User Interrupts Date: Mon, 13 Sep 2021 13:01:21 -0700 Message-Id: <20210913200132.3396598-3-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org For now, include just the hardware and software architecture summary. Signed-off-by: Sohil Mehta --- Documentation/x86/index.rst | 1 + Documentation/x86/user-interrupts.rst | 107 ++++++++++++++++++++++++++ 2 files changed, 108 insertions(+) create mode 100644 Documentation/x86/user-interrupts.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 383048396336..0d416b02131b 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -31,6 +31,7 @@ x86-specific Documentation tsx_async_abort buslock usb-legacy-support + user-interrupts i386/index x86_64/index sva diff --git a/Documentation/x86/user-interrupts.rst b/Documentation/x86/user-interrupts.rst new file mode 100644 index 000000000000..bc90251d6c2e --- /dev/null +++ b/Documentation/x86/user-interrupts.rst @@ -0,0 +1,107 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================= +User Interrupts (UINTR) +======================= + +Overview +======== +User Interrupts provides a low latency event delivery and inter process +communication mechanism. These events can be delivered directly to userspace +without a transition through the kernel. + +In the User Interrupts architecture, a receiver is always expected to be a user +space task. However, a user interrupt can be sent by another user space task, +kernel or an external source (like a device). The feature that allows another +task to send an interrupt is referred to as User IPI. + +Hardware Summary +================ +User Interrupts is a posted interrupt delivery mechanism. The interrupts are +first posted to a memory location and then delivered to the receiver when they +are running with CPL=3. + +Kernel managed architectural data structures +-------------------------------------------- +UPID: User Posted Interrupt Descriptor - Holds receiver interrupt vector +information and notification state (like an ongoing notification, suppressed +notifications). + +UITT: User Interrupt Target Table - Stores UPID pointer and vector information +for interrupt routing on the sender side. Referred by the senduipi instruction. + +The interrupt state of each task is referenced via MSRs which are saved and +restored by the kernel during context switch. + +Instructions +------------ +senduipi - send a user IPI to a target task based on the UITT index. + +clui - Mask user interrupts by clearing UIF (User Interrupt Flag). + +stui - Unmask user interrupts by setting UIF. + +testui - Test current value of UIF. + +uiret - return from a user interrupt handler. + +User IPI +-------- +When a User IPI sender executes 'senduipi ' the hardware refers the UITT +table entry pointed by the index and posts the interrupt vector into the +receiver's UPID. + +If the receiver is running the sender cpu would send a physical IPI to the +receiver's cpu. On the receiver side this IPI is detected as a User Interrupt. +The User Interrupt handler for the receiver is invoked and the vector number is +pushed onto the stack. + +Upon execution of 'uiret' in the interrupt handler, the control is transferred +back to instruction that was interrupted. + +Refer the Intel Software Developer's Manual for more details. + +Software Architecture +===================== +User Interrupts (Uintr) is an opt-in feature (unlike signals). Applications +wanting to use Uintr are expected to register themselves with the kernel using +the Uintr related system calls. A Uintr receiver is always a userspace task. A +Uintr sender can be another userspace task, kernel or a device. + +1) A receiver can register/unregister an interrupt handler using the Uintr +receiver related syscalls. + uintr_register_handler(handler, flags) + +2) A syscall also allows a receiver to register a vector and create a user +interrupt file descriptor - uintr_fd. + uintr_fd = uintr_create_fd(vector, flags) + +Uintr can be useful in some of the usages where eventfd or signals are used for +frequent userspace event notifications. The semantics of uintr_fd are somewhat +similar to an eventfd() or the write end of a pipe. + +3) Any sender with access to uintr_fd can use it to deliver events (in this +case - interrupts) to a receiver. A sender task can manage its connection with +the receiver using the sender related syscalls based on uintr_fd. + uipi_index = uintr_register_sender(uintr_fd, flags) + +Using an FD abstraction provides a secure mechanism to connect with a receiver. +The FD sharing and isolation mechanisms put in place by the kernel would extend +to Uintr as well. + +4a) After the initial setup, a sender task can use the SENDUIPI instruction to +generate user IPIs without any kernel intervention. + SENDUIPI + +If the receiver is running (CPL=3), then the user interrupt is delivered +directly without a kernel transition. If the receiver isn't running the +interrupt is delivered when the receiver gets context switched back. If the +receiver is blocked in the kernel, the user interrupt is delivered to the +kernel which then unblocks the intended receiver to deliver the interrupt. + +4b) If the sender is the kernel or a device, the uintr_fd can be passed onto +the related kernel entity to allow them to setup a connection and then generate +a user interrupt for event delivery. + +Refer the Uintr man-pages for details on the syscall interface. From patchwork Mon Sep 13 20:01:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490711 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2F14C4332F for ; Mon, 13 Sep 2021 20:04:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 804A361165 for ; Mon, 13 Sep 2021 20:04:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347830AbhIMUFu (ORCPT ); Mon, 13 Sep 2021 16:05:50 -0400 Received: from mga05.intel.com ([192.55.52.43]:38692 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347783AbhIMUFr (ORCPT ); Mon, 13 Sep 2021 16:05:47 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336361" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336361" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643909" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:30 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 03/13] x86/cpu: Enumerate User Interrupts support Date: Mon, 13 Sep 2021 13:01:22 -0700 Message-Id: <20210913200132.3396598-4-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org User Interrupts support including user IPIs is enumerated through cpuid. The 'uintr' flag in /proc/cpuinfo can be used to identify it. The recommended mechanism for user applications to detect support is calling the uintr related syscalls. Use CONFIG_X86_USER_INTERRUPTS to compile with User Interrupts support. The feature can be disabled at boot time using the 'nouintr' kernel parameter. SENDUIPI is a special ring-3 instruction that makes a supervisor mode memory access to the UPID and UITT memory. Currently, KPTI needs to be off for User IPIs to work. Processors that support user interrupts are not affected by Meltdown so the auto mode of KPTI will default to off. Users who want to force enable KPTI will need to wait for a later version of this patch series that is compatible with KPTI. We need to allocate the UPID and UITT structures from a special memory region that has supervisor access but it is mapped into userspace. The plan is to implement a mechanism similar to LDT. Signed-off-by: Jacob Pan Signed-off-by: Sohil Mehta --- .../admin-guide/kernel-parameters.txt | 2 + arch/x86/Kconfig | 12 ++++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 ++- arch/x86/include/asm/msr-index.h | 8 +++ arch/x86/include/uapi/asm/processor-flags.h | 2 + arch/x86/kernel/cpu/common.c | 55 +++++++++++++++++++ arch/x86/kernel/cpu/cpuid-deps.c | 1 + 8 files changed, 88 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 91ba391f9b32..471e82be87ff 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3288,6 +3288,8 @@ nofsgsbase [X86] Disables FSGSBASE instructions. + nouintr [X86-64] Disables User Interrupts support. + no_console_suspend [HW] Never suspend the console Disable suspending of consoles during suspend and diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4e001bbbb425..6f7f31e92f3e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1845,6 +1845,18 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS If unsure, say y. +config X86_USER_INTERRUPTS + bool "User Interrupts (UINTR)" + depends on X86_LOCAL_APIC && X86_64 + depends on CPU_SUP_INTEL + help + User Interrupts are events that can be delivered directly to + userspace without a transition through the kernel. The interrupts + could be generated by another userspace application, kernel or a + device. + + Refer, Documentation/x86/user-interrupts.rst for details. + choice prompt "TSX enable mode" depends on CPU_SUP_INTEL diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d0ce5cfd3ac1..634e80ee5db5 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -375,6 +375,7 @@ #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ #define X86_FEATURE_FSRM (18*32+ 4) /* Fast Short Rep Mov */ +#define X86_FEATURE_UINTR (18*32+ 5) /* User Interrupts support */ #define X86_FEATURE_AVX512_VP2INTERSECT (18*32+ 8) /* AVX-512 Intersect for D/Q */ #define X86_FEATURE_SRBDS_CTRL (18*32+ 9) /* "" SRBDS mitigation MSR available */ #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 8f28fafa98b3..27fb1c70ade6 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -65,6 +65,12 @@ # define DISABLE_SGX (1 << (X86_FEATURE_SGX & 31)) #endif +#ifdef CONFIG_X86_USER_INTERRUPTS +# define DISABLE_UINTR 0 +#else +# define DISABLE_UINTR (1 << (X86_FEATURE_UINTR & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -87,7 +93,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ DISABLE_ENQCMD) #define DISABLED_MASK17 0 -#define DISABLED_MASK18 0 +#define DISABLED_MASK18 (DISABLE_UINTR) #define DISABLED_MASK19 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index a7c413432b33..4fdba281d002 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -375,6 +375,14 @@ #define MSR_HWP_REQUEST 0x00000774 #define MSR_HWP_STATUS 0x00000777 +/* User Interrupt interface */ +#define MSR_IA32_UINTR_RR 0x985 +#define MSR_IA32_UINTR_HANDLER 0x986 +#define MSR_IA32_UINTR_STACKADJUST 0x987 +#define MSR_IA32_UINTR_MISC 0x988 /* 39:32-UINV, 31:0-UITTSZ */ +#define MSR_IA32_UINTR_PD 0x989 +#define MSR_IA32_UINTR_TT 0x98a + /* CPUID.6.EAX */ #define HWP_BASE_BIT (1<<7) #define HWP_NOTIFICATIONS_BIT (1<<8) diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index bcba3c643e63..919ce7f456d4 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -130,6 +130,8 @@ #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT) #define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */ #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT) +#define X86_CR4_UINTR_BIT 25 /* enable User Interrupts support */ +#define X86_CR4_UINTR _BITUL(X86_CR4_UINTR_BIT) /* * x86-64 Task Priority Register, CR8 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 0f8885949e8c..55fee930b6d1 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -308,6 +308,58 @@ static __always_inline void setup_smep(struct cpuinfo_x86 *c) cr4_set_bits(X86_CR4_SMEP); } +static __init int setup_disable_uintr(char *arg) +{ + /* No additional arguments expected */ + if (strlen(arg)) + return 0; + + /* Do not emit a message if the feature is not present. */ + if (!boot_cpu_has(X86_FEATURE_UINTR)) + return 1; + + setup_clear_cpu_cap(X86_FEATURE_UINTR); + pr_info_once("x86: 'nouintr' specified, User Interrupts support disabled\n"); + return 1; +} +__setup("nouintr", setup_disable_uintr); + +static __always_inline void setup_uintr(struct cpuinfo_x86 *c) +{ + /* check the boot processor, plus compile options for UINTR. */ + if (!cpu_feature_enabled(X86_FEATURE_UINTR)) + goto disable_uintr; + + /* checks the current processor's cpuid bits: */ + if (!cpu_has(c, X86_FEATURE_UINTR)) + goto disable_uintr; + + /* + * User Interrupts currently doesn't support PTI. For processors that + * support User interrupts PTI in auto mode will default to off. Need + * this check only for users who have force enabled PTI. + */ + if (boot_cpu_has(X86_FEATURE_PTI)) { + pr_info_once("x86: User Interrupts (UINTR) not enabled. Please disable PTI using 'nopti' kernel parameter\n"); + goto clear_uintr_cap; + } + + cr4_set_bits(X86_CR4_UINTR); + pr_info_once("x86: User Interrupts (UINTR) enabled\n"); + + return; + +clear_uintr_cap: + setup_clear_cpu_cap(X86_FEATURE_UINTR); + +disable_uintr: + /* + * Make sure UINTR is disabled in case it was enabled in a + * previous boot (e.g., via kexec). + */ + cr4_clear_bits(X86_CR4_UINTR); +} + static __init int setup_disable_smap(char *arg) { setup_clear_cpu_cap(X86_FEATURE_SMAP); @@ -1564,6 +1616,9 @@ static void identify_cpu(struct cpuinfo_x86 *c) setup_smap(c); setup_umip(c); + /* Set up User Interrupts */ + setup_uintr(c); + /* Enable FSGSBASE instructions if available. */ if (cpu_has(c, X86_FEATURE_FSGSBASE)) { cr4_set_bits(X86_CR4_FSGSBASE); diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index defda61f372d..6f7eb4af5b4a 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -75,6 +75,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_SGX_LC, X86_FEATURE_SGX }, { X86_FEATURE_SGX1, X86_FEATURE_SGX }, { X86_FEATURE_SGX2, X86_FEATURE_SGX1 }, + { X86_FEATURE_UINTR, X86_FEATURE_XSAVES }, {} }; From patchwork Mon Sep 13 20:01:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490713 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAAD5C433F5 for ; Mon, 13 Sep 2021 20:04:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C773C610A6 for ; Mon, 13 Sep 2021 20:04:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347869AbhIMUFx (ORCPT ); Mon, 13 Sep 2021 16:05:53 -0400 Received: from mga05.intel.com ([192.55.52.43]:38689 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347789AbhIMUFr (ORCPT ); Mon, 13 Sep 2021 16:05:47 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336365" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336365" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643912" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:30 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 04/13] x86/fpu/xstate: Enumerate User Interrupts supervisor state Date: Mon, 13 Sep 2021 13:01:23 -0700 Message-Id: <20210913200132.3396598-5-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Enable xstate supervisor support for User Interrupts by default. The user interrupt state for a task consists of the MSR state and the User Interrupt Flag (UIF) value. XSAVES and XRSTORS handle saving and restoring both of these states. Signed-off-by: Sohil Mehta --- arch/x86/include/asm/fpu/types.h | 20 +++++++++++++++++++- arch/x86/include/asm/fpu/xstate.h | 3 ++- arch/x86/kernel/cpu/common.c | 6 ++++++ arch/x86/kernel/fpu/xstate.c | 20 +++++++++++++++++--- 4 files changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index f5a38a5f3ae1..b614f1416bea 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -118,7 +118,7 @@ enum xfeature { XFEATURE_RSRVD_COMP_11, XFEATURE_RSRVD_COMP_12, XFEATURE_RSRVD_COMP_13, - XFEATURE_RSRVD_COMP_14, + XFEATURE_UINTR, XFEATURE_LBR, XFEATURE_MAX, @@ -135,6 +135,7 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_UINTR (1 << XFEATURE_UINTR) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE) @@ -237,6 +238,23 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 14 is supervisor state used for User Interrupts state. + * The size of this state is 48 bytes + */ +struct uintr_state { + u64 handler; + u64 stack_adjust; + u32 uitt_size; + u8 uinv; + u8 pad1; + u8 pad2; + u8 uif_pad3; /* bit 7 - UIF, bits 6:0 - reserved */ + u64 upid_addr; + u64 uirr; + u64 uitt_addr; +} __packed; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index 109dfcc75299..4dd4e83c0c9d 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -44,7 +44,8 @@ (XFEATURE_MASK_USER_SUPPORTED & ~XFEATURE_MASK_PKRU) /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_UINTR) /* * A supervisor state component may not always contain valuable information, diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 55fee930b6d1..3a0a3f5cfe0f 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -334,6 +334,12 @@ static __always_inline void setup_uintr(struct cpuinfo_x86 *c) if (!cpu_has(c, X86_FEATURE_UINTR)) goto disable_uintr; + /* Confirm XSAVE support for UINTR is present. */ + if (!cpu_has_xfeatures(XFEATURE_MASK_UINTR, NULL)) { + pr_info_once("x86: User Interrupts (UINTR) not enabled. XSAVE support for UINTR is missing.\n"); + goto clear_uintr_cap; + } + /* * User Interrupts currently doesn't support PTI. For processors that * support User interrupts PTI in auto mode will default to off. Need diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index c8def1b7f8fb..ab19403effb0 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -38,6 +38,10 @@ static const char *xfeature_names[] = "Processor Trace (unused)" , "Protection Keys User registers", "PASID state", + "unknown xstate feature 11", + "unknown xstate feature 12", + "unknown xstate feature 13", + "User Interrupts registers", "unknown xstate feature" , }; @@ -53,6 +57,10 @@ static short xsave_cpuid_features[] __initdata = { X86_FEATURE_INTEL_PT, X86_FEATURE_PKU, X86_FEATURE_ENQCMD, + -1, /* Unknown 11 */ + -1, /* Unknown 12 */ + -1, /* Unknown 13 */ + X86_FEATURE_UINTR, }; /* @@ -236,6 +244,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_UINTR); } /* @@ -372,7 +381,8 @@ static void __init print_xstate_offset_size(void) XFEATURE_MASK_PKRU | \ XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ - XFEATURE_MASK_PASID) + XFEATURE_MASK_PASID | \ + XFEATURE_MASK_UINTR) /* * setup the xstate image representing the init state @@ -532,6 +542,7 @@ static void check_xstate_against_struct(int nr) XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); + XCHECK_SZ(sz, nr, XFEATURE_UINTR, struct uintr_state); /* * Make *SURE* to add any feature numbers in below if @@ -539,9 +550,12 @@ static void check_xstate_against_struct(int nr) * numbers. */ if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_LBR))) { + (nr == XFEATURE_RSRVD_COMP_11) || + (nr == XFEATURE_RSRVD_COMP_12) || + (nr == XFEATURE_RSRVD_COMP_13) || + (nr == XFEATURE_LBR) || + (nr >= XFEATURE_MAX)) { WARN_ONCE(1, "no structure for xstate: %d\n", nr); XSTATE_WARN_ON(1); } From patchwork Mon Sep 13 20:01:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490719 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04F28C433EF for ; Mon, 13 Sep 2021 20:04:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E46796108B for ; Mon, 13 Sep 2021 20:04:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347932AbhIMUF7 (ORCPT ); Mon, 13 Sep 2021 16:05:59 -0400 Received: from mga05.intel.com ([192.55.52.43]:38692 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347790AbhIMUFr (ORCPT ); Mon, 13 Sep 2021 16:05:47 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336367" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336367" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643915" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:30 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 05/13] x86/irq: Reserve a user IPI notification vector Date: Mon, 13 Sep 2021 13:01:24 -0700 Message-Id: <20210913200132.3396598-6-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org A user interrupt notification vector is used on the receiver's cpu to identify an interrupt as a user interrupt (and not a kernel interrupt). Hardware uses the same notification vector to generate an IPI from a sender's cpu core when the SENDUIPI instruction is executed. Typically, the kernel shouldn't receive an interrupt with this vector. However, it is possible that the kernel might receive this vector. Scenario that can cause the spurious interrupt: Step cpu 0 (receiver task) cpu 1 (sender task) ---- --------------------- ------------------- 1 task is running 2 executes SENDUIPI 3 IPI sent 4 context switched out 5 IPI delivered (kernel interrupt detected) A kernel interrupt can be detected, if a receiver task gets scheduled out after the SENDUIPI-based IPI was sent but before the IPI was delivered. The kernel doesn't need to do anything in this case other than receiving the interrupt and clearing the local APIC. The user interrupt is always stored in the receiver's UPID before the IPI is generated. When the receiver gets scheduled back the interrupt would be delivered based on its UPID. Signed-off-by: Jacob Pan Signed-off-by: Sohil Mehta --- arch/x86/include/asm/hardirq.h | 3 +++ arch/x86/include/asm/idtentry.h | 4 ++++ arch/x86/include/asm/irq_vectors.h | 5 ++++- arch/x86/kernel/idt.c | 3 +++ arch/x86/kernel/irq.c | 33 ++++++++++++++++++++++++++++++ 5 files changed, 47 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index 275e7fd20310..279afc01f1ac 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -19,6 +19,9 @@ typedef struct { unsigned int kvm_posted_intr_ipis; unsigned int kvm_posted_intr_wakeup_ipis; unsigned int kvm_posted_intr_nested_ipis; +#endif +#ifdef CONFIG_X86_USER_INTERRUPTS + unsigned int uintr_spurious_count; #endif unsigned int x86_platform_ipis; /* arch dependent */ unsigned int apic_perf_irqs; diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 1345088e9902..5929a6f9eeee 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -671,6 +671,10 @@ DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_WAKEUP_VECTOR, sysvec_kvm_posted_intr_wakeup DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_NESTED_VECTOR, sysvec_kvm_posted_intr_nested_ipi); #endif +#ifdef CONFIG_X86_USER_INTERRUPTS +DECLARE_IDTENTRY_SYSVEC(UINTR_NOTIFICATION_VECTOR, sysvec_uintr_spurious_interrupt); +#endif + #if IS_ENABLED(CONFIG_HYPERV) DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_CALLBACK_VECTOR, sysvec_hyperv_callback); DECLARE_IDTENTRY_SYSVEC(HYPERV_REENLIGHTENMENT_VECTOR, sysvec_hyperv_reenlightenment); diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h index 43dcb9284208..d26faa504931 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -104,7 +104,10 @@ #define HYPERV_STIMER0_VECTOR 0xed #endif -#define LOCAL_TIMER_VECTOR 0xec +/* Vector for User interrupt notifications */ +#define UINTR_NOTIFICATION_VECTOR 0xec + +#define LOCAL_TIMER_VECTOR 0xeb #define NR_VECTORS 256 diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index df0fa695bb09..d8c45e0728f0 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -147,6 +147,9 @@ static const __initconst struct idt_data apic_idts[] = { INTG(POSTED_INTR_WAKEUP_VECTOR, asm_sysvec_kvm_posted_intr_wakeup_ipi), INTG(POSTED_INTR_NESTED_VECTOR, asm_sysvec_kvm_posted_intr_nested_ipi), # endif +#ifdef CONFIG_X86_USER_INTERRUPTS + INTG(UINTR_NOTIFICATION_VECTOR, asm_sysvec_uintr_spurious_interrupt), +#endif # ifdef CONFIG_IRQ_WORK INTG(IRQ_WORK_VECTOR, asm_sysvec_irq_work), # endif diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index e28f6a5d14f1..e3c35668c7c5 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -181,6 +181,12 @@ int arch_show_interrupts(struct seq_file *p, int prec) seq_printf(p, "%10u ", irq_stats(j)->kvm_posted_intr_wakeup_ipis); seq_puts(p, " Posted-interrupt wakeup event\n"); +#endif +#ifdef CONFIG_X86_USER_INTERRUPTS + seq_printf(p, "%*s: ", prec, "UIS"); + for_each_online_cpu(j) + seq_printf(p, "%10u ", irq_stats(j)->uintr_spurious_count); + seq_puts(p, " User-interrupt spurious event\n"); #endif return 0; } @@ -325,6 +331,33 @@ DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_kvm_posted_intr_nested_ipi) } #endif +#ifdef CONFIG_X86_USER_INTERRUPTS +/* + * Handler for UINTR_NOTIFICATION_VECTOR. + * + * The notification vector is used by the cpu to detect a User Interrupt. In + * the typical usage, the cpu would handle this interrupt and clear the local + * apic. + * + * However, it is possible that the kernel might receive this vector. This can + * happen if the receiver thread was running when the interrupt was sent but it + * got scheduled out before the interrupt was delivered. The kernel doesn't + * need to do anything other than clearing the local APIC. A pending user + * interrupt is always saved in the receiver's UPID which can be referenced + * when the receiver gets scheduled back. + * + * If the kernel receives a storm of these, it could mean an issue with the + * kernel's saving and restoring of the User Interrupt MSR state; Specifically, + * the notification vector bits in the IA32_UINTR_MISC_MSR. + */ +DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_uintr_spurious_interrupt) +{ + /* TODO: Add entry-exit tracepoints */ + ack_APIC_irq(); + inc_irq_stat(uintr_spurious_count); +} +#endif + #ifdef CONFIG_HOTPLUG_CPU /* A cpu has been removed from cpu_online_mask. Reset irq affinities. */ From patchwork Mon Sep 13 20:01:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CBEDC4332F for ; Mon, 13 Sep 2021 20:04:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 27D68610A6 for ; Mon, 13 Sep 2021 20:04:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347915AbhIMUF5 (ORCPT ); Mon, 13 Sep 2021 16:05:57 -0400 Received: from mga05.intel.com ([192.55.52.43]:38697 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347806AbhIMUFs (ORCPT ); Mon, 13 Sep 2021 16:05:48 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336371" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336371" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643918" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:31 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 06/13] x86/uintr: Introduce uintr receiver syscalls Date: Mon, 13 Sep 2021 13:01:25 -0700 Message-Id: <20210913200132.3396598-7-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Any application that wants to receive a user interrupt needs to register an interrupt handler with the kernel. Add a registration syscall that sets up the interrupt handler and the related kernel structures for the task that makes this syscall. Only one interrupt handler per task can be registered with the kernel/hardware. Each task has its private interrupt vector space of 64 vectors. The vector registration and the related FD management is covered later. Also add an unregister syscall to let a task unregister the interrupt handler. The UPID for each receiver task needs to be updated whenever a task gets context switched or it moves from one cpu to another. This will also be covered later. The system calls haven't been wired up yet so no real harm is done if we don't update the UPID right now. Signed-off-by: Jacob Pan Signed-off-by: Sohil Mehta --- arch/x86/include/asm/processor.h | 6 + arch/x86/include/asm/uintr.h | 13 ++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/uintr_core.c | 240 +++++++++++++++++++++++++++++++ arch/x86/kernel/uintr_fd.c | 58 ++++++++ 5 files changed, 318 insertions(+) create mode 100644 arch/x86/include/asm/uintr.h create mode 100644 arch/x86/kernel/uintr_core.c create mode 100644 arch/x86/kernel/uintr_fd.c diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 9ad2acaaae9b..d229bfac8b4f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -9,6 +9,7 @@ struct task_struct; struct mm_struct; struct io_bitmap; struct vm86; +struct uintr_receiver; #include #include @@ -529,6 +530,11 @@ struct thread_struct { */ u32 pkru; +#ifdef CONFIG_X86_USER_INTERRUPTS + /* User Interrupt state*/ + struct uintr_receiver *ui_recv; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h new file mode 100644 index 000000000000..4f35bd8bd4e0 --- /dev/null +++ b/arch/x86/include/asm/uintr.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_UINTR_H +#define _ASM_X86_UINTR_H + +#ifdef CONFIG_X86_USER_INTERRUPTS + +bool uintr_arch_enabled(void); +int do_uintr_register_handler(u64 handler); +int do_uintr_unregister_handler(void); + +#endif /* CONFIG_X86_USER_INTERRUPTS */ + +#endif /* _ASM_X86_UINTR_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 8f4e8fa6ed75..060ca9f23e23 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -140,6 +140,7 @@ obj-$(CONFIG_UPROBES) += uprobes.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o +obj-$(CONFIG_X86_USER_INTERRUPTS) += uintr_fd.o uintr_core.o obj-$(CONFIG_X86_UMIP) += umip.o obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c new file mode 100644 index 000000000000..2c6042a6840a --- /dev/null +++ b/arch/x86/kernel/uintr_core.c @@ -0,0 +1,240 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2021, Intel Corporation. + * + * Sohil Mehta + * Jacob Pan + */ +#define pr_fmt(fmt) "uintr: " fmt + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +/* User Posted Interrupt Descriptor (UPID) */ +struct uintr_upid { + struct { + u8 status; /* bit 0: ON, bit 1: SN, bit 2-7: reserved */ + u8 reserved1; /* Reserved */ + u8 nv; /* Notification vector */ + u8 reserved2; /* Reserved */ + u32 ndst; /* Notification destination */ + } nc __packed; /* Notification control */ + u64 puir; /* Posted user interrupt requests */ +} __aligned(64); + +/* UPID Notification control status */ +#define UPID_ON 0x0 /* Outstanding notification */ +#define UPID_SN 0x1 /* Suppressed notification */ + +struct uintr_upid_ctx { + struct uintr_upid *upid; + refcount_t refs; +}; + +struct uintr_receiver { + struct uintr_upid_ctx *upid_ctx; +}; + +inline bool uintr_arch_enabled(void) +{ + return static_cpu_has(X86_FEATURE_UINTR); +} + +static inline bool is_uintr_receiver(struct task_struct *t) +{ + return !!t->thread.ui_recv; +} + +static inline u32 cpu_to_ndst(int cpu) +{ + u32 apicid = (u32)apic->cpu_present_to_apicid(cpu); + + WARN_ON_ONCE(apicid == BAD_APICID); + + if (!x2apic_enabled()) + return (apicid << 8) & 0xFF00; + + return apicid; +} + +static void free_upid(struct uintr_upid_ctx *upid_ctx) +{ + kfree(upid_ctx->upid); + upid_ctx->upid = NULL; + kfree(upid_ctx); +} + +/* TODO: UPID needs to be allocated by a KPTI compatible allocator */ +static struct uintr_upid_ctx *alloc_upid(void) +{ + struct uintr_upid_ctx *upid_ctx; + struct uintr_upid *upid; + + upid_ctx = kzalloc(sizeof(*upid_ctx), GFP_KERNEL); + if (!upid_ctx) + return NULL; + + upid = kzalloc(sizeof(*upid), GFP_KERNEL); + + if (!upid) { + kfree(upid_ctx); + return NULL; + } + + upid_ctx->upid = upid; + refcount_set(&upid_ctx->refs, 1); + + return upid_ctx; +} + +static void put_upid_ref(struct uintr_upid_ctx *upid_ctx) +{ + if (refcount_dec_and_test(&upid_ctx->refs)) + free_upid(upid_ctx); +} + +int do_uintr_unregister_handler(void) +{ + struct task_struct *t = current; + struct fpu *fpu = &t->thread.fpu; + struct uintr_receiver *ui_recv; + u64 msr64; + + if (!is_uintr_receiver(t)) + return -EINVAL; + + pr_debug("recv: Unregister handler and clear MSRs for task=%d\n", + t->pid); + + /* + * TODO: Evaluate usage of fpregs_lock() and get_xsave_addr(). Bugs + * have been reported recently for PASID and WRPKRU. + * + * UPID and ui_recv will be referenced during context switch. Need to + * disable preemption while modifying the MSRs, UPID and ui_recv thread + * struct. + */ + fpregs_lock(); + + /* Clear only the receiver specific state. Sender related state is not modified */ + if (fpregs_state_valid(fpu, smp_processor_id())) { + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, msr64); + msr64 &= ~GENMASK_ULL(39, 32); + wrmsrl(MSR_IA32_UINTR_MISC, msr64); + wrmsrl(MSR_IA32_UINTR_PD, 0ULL); + wrmsrl(MSR_IA32_UINTR_RR, 0ULL); + wrmsrl(MSR_IA32_UINTR_STACKADJUST, 0ULL); + wrmsrl(MSR_IA32_UINTR_HANDLER, 0ULL); + } else { + struct uintr_state *p; + + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->handler = 0; + p->stack_adjust = 0; + p->upid_addr = 0; + p->uinv = 0; + p->uirr = 0; + } + } + + ui_recv = t->thread.ui_recv; + /* + * Suppress notifications so that no further interrupts are generated + * based on this UPID. + */ + set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); + + put_upid_ref(ui_recv->upid_ctx); + kfree(ui_recv); + t->thread.ui_recv = NULL; + + fpregs_unlock(); + + return 0; +} + +int do_uintr_register_handler(u64 handler) +{ + struct uintr_receiver *ui_recv; + struct uintr_upid *upid; + struct task_struct *t = current; + struct fpu *fpu = &t->thread.fpu; + u64 misc_msr; + int cpu; + + if (is_uintr_receiver(t)) + return -EBUSY; + + ui_recv = kzalloc(sizeof(*ui_recv), GFP_KERNEL); + if (!ui_recv) + return -ENOMEM; + + ui_recv->upid_ctx = alloc_upid(); + if (!ui_recv->upid_ctx) { + kfree(ui_recv); + pr_debug("recv: alloc upid failed for task=%d\n", t->pid); + return -ENOMEM; + } + + /* + * TODO: Evaluate usage of fpregs_lock() and get_xsave_addr(). Bugs + * have been reported recently for PASID and WRPKRU. + * + * UPID and ui_recv will be referenced during context switch. Need to + * disable preemption while modifying the MSRs, UPID and ui_recv thread + * struct. + */ + fpregs_lock(); + + cpu = smp_processor_id(); + upid = ui_recv->upid_ctx->upid; + upid->nc.nv = UINTR_NOTIFICATION_VECTOR; + upid->nc.ndst = cpu_to_ndst(cpu); + + t->thread.ui_recv = ui_recv; + + if (fpregs_state_valid(fpu, cpu)) { + wrmsrl(MSR_IA32_UINTR_HANDLER, handler); + wrmsrl(MSR_IA32_UINTR_PD, (u64)ui_recv->upid_ctx->upid); + + /* Set value as size of ABI redzone */ + wrmsrl(MSR_IA32_UINTR_STACKADJUST, 128); + + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, misc_msr); + misc_msr |= (u64)UINTR_NOTIFICATION_VECTOR << 32; + wrmsrl(MSR_IA32_UINTR_MISC, misc_msr); + } else { + struct xregs_state *xsave; + struct uintr_state *p; + + xsave = &fpu->state.xsave; + xsave->header.xfeatures |= XFEATURE_MASK_UINTR; + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->handler = handler; + p->upid_addr = (u64)ui_recv->upid_ctx->upid; + p->stack_adjust = 128; + p->uinv = UINTR_NOTIFICATION_VECTOR; + } + } + + fpregs_unlock(); + + pr_debug("recv: task=%d register handler=%llx upid %px\n", + t->pid, handler, upid); + + return 0; +} diff --git a/arch/x86/kernel/uintr_fd.c b/arch/x86/kernel/uintr_fd.c new file mode 100644 index 000000000000..a1a9c105fdab --- /dev/null +++ b/arch/x86/kernel/uintr_fd.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2021, Intel Corporation. + * + * Sohil Mehta + */ +#define pr_fmt(fmt) "uintr: " fmt + +#include +#include + +#include + +/* + * sys_uintr_register_handler - setup user interrupt handler for receiver. + */ +SYSCALL_DEFINE2(uintr_register_handler, u64 __user *, handler, unsigned int, flags) +{ + int ret; + + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + /* TODO: Validate the handler address */ + if (!handler) + return -EFAULT; + + ret = do_uintr_register_handler((u64)handler); + + pr_debug("recv: register handler task=%d flags %d handler %lx ret %d\n", + current->pid, flags, (unsigned long)handler, ret); + + return ret; +} + +/* + * sys_uintr_unregister_handler - Teardown user interrupt handler for receiver. + */ +SYSCALL_DEFINE1(uintr_unregister_handler, unsigned int, flags) +{ + int ret; + + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + ret = do_uintr_unregister_handler(); + + pr_debug("recv: unregister handler task=%d flags %d ret %d\n", + current->pid, flags, ret); + + return ret; +} From patchwork Mon Sep 13 20:01:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490721 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98553C433F5 for ; Mon, 13 Sep 2021 20:04:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 804596108B for ; Mon, 13 Sep 2021 20:04:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347979AbhIMUGI (ORCPT ); Mon, 13 Sep 2021 16:06:08 -0400 Received: from mga05.intel.com ([192.55.52.43]:38689 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347850AbhIMUFw (ORCPT ); Mon, 13 Sep 2021 16:05:52 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336375" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336375" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643922" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:31 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 07/13] x86/process/64: Add uintr task context switch support Date: Mon, 13 Sep 2021 13:01:26 -0700 Message-Id: <20210913200132.3396598-8-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org User interrupt state is saved and restored using xstate supervisor feature support. This includes the MSR state and the User Interrupt Flag (UIF) value. During context switch update the UPID for a uintr task to reflect the current state of the task; namely whether the task should receive interrupt notifications and which cpu the task is currently running on. XSAVES clears the notification vector (UINV) in the MISC MSR to prevent interrupts from being recognized in the UIRR MSR while the task is being context switched. The UINV is restored back when the kernel does an XRSTORS. However, this conflicts with the kernel's lazy restore optimization which skips an XRSTORS if the kernel is scheduling the same user task back and the underlying MSR state hasn't been modified. Special handling is needed for a uintr task in the context switch path to keep using this optimization. Signed-off-by: Jacob Pan Signed-off-by: Sohil Mehta --- arch/x86/include/asm/entry-common.h | 4 ++ arch/x86/include/asm/uintr.h | 9 ++++ arch/x86/kernel/fpu/core.c | 8 +++ arch/x86/kernel/process_64.c | 4 ++ arch/x86/kernel/uintr_core.c | 75 +++++++++++++++++++++++++++++ 5 files changed, 100 insertions(+) diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h index 14ebd2196569..4e6c4d0912a5 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -8,6 +8,7 @@ #include #include #include +#include /* Check that the stack and regs on entry from user mode are sane. */ static __always_inline void arch_check_user_regs(struct pt_regs *regs) @@ -57,6 +58,9 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, if (unlikely(ti_work & _TIF_NEED_FPU_LOAD)) switch_fpu_return(); + if (static_cpu_has(X86_FEATURE_UINTR)) + switch_uintr_return(); + #ifdef CONFIG_COMPAT /* * Compat syscalls set TS_COMPAT. Make sure we clear it before diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h index 4f35bd8bd4e0..f7ccb67014b8 100644 --- a/arch/x86/include/asm/uintr.h +++ b/arch/x86/include/asm/uintr.h @@ -8,6 +8,15 @@ bool uintr_arch_enabled(void); int do_uintr_register_handler(u64 handler); int do_uintr_unregister_handler(void); +/* TODO: Inline the context switch related functions */ +void switch_uintr_prepare(struct task_struct *prev); +void switch_uintr_return(void); + +#else /* !CONFIG_X86_USER_INTERRUPTS */ + +static inline void switch_uintr_prepare(struct task_struct *prev) {} +static inline void switch_uintr_return(void) {} + #endif /* CONFIG_X86_USER_INTERRUPTS */ #endif /* _ASM_X86_UINTR_H */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 7ada7bd03a32..e30588bf7ce9 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -95,6 +95,14 @@ EXPORT_SYMBOL(irq_fpu_usable); * over the place. * * FXSAVE and all XSAVE variants preserve the FPU register state. + * + * When XSAVES is called with XFEATURE_UINTR enabled it + * saves the FPU state and clears the interrupt notification + * vector byte of the MISC_MSR [bits 39:32]. This is required + * to stop detecting additional User Interrupts after we + * have saved the FPU state. Before going back to userspace + * we would correct this and only program the byte that was + * cleared. */ void save_fpregs_to_fpstate(struct fpu *fpu) { diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index ec0d836a13b1..62b82137db9c 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #ifdef CONFIG_IA32_EMULATION @@ -565,6 +566,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) && this_cpu_read(hardirq_stack_inuse)); + if (static_cpu_has(X86_FEATURE_UINTR)) + switch_uintr_prepare(prev_p); + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) switch_fpu_prepare(prev_fpu, cpu); diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c index 2c6042a6840a..7a29888050ad 100644 --- a/arch/x86/kernel/uintr_core.c +++ b/arch/x86/kernel/uintr_core.c @@ -238,3 +238,78 @@ int do_uintr_register_handler(u64 handler) return 0; } + +/* Suppress notifications since this task is being context switched out */ +void switch_uintr_prepare(struct task_struct *prev) +{ + struct uintr_upid *upid; + + if (is_uintr_receiver(prev)) { + upid = prev->thread.ui_recv->upid_ctx->upid; + set_bit(UPID_SN, (unsigned long *)&upid->nc.status); + } +} + +/* + * Do this right before we are going back to userspace after the FPU has been + * reloaded i.e. TIF_NEED_FPU_LOAD is clear. + * Called from arch_exit_to_user_mode_prepare() with interrupts disabled. + */ +void switch_uintr_return(void) +{ + struct uintr_upid *upid; + u64 misc_msr; + + if (is_uintr_receiver(current)) { + /* + * The XSAVES instruction clears the UINTR notification + * vector(UINV) in the UINT_MISC MSR when user context gets + * saved. Before going back to userspace we need to restore the + * notification vector. XRSTORS would automatically restore the + * notification but we can't be sure that XRSTORS will always + * be called when going back to userspace. Also if XSAVES gets + * called twice the UINV stored in the Xstate buffer will be + * overwritten. Threfore, before going back to userspace we + * always check if the UINV is set and reprogram if needed. + * + * Alternatively, we could combine this with + * switch_fpu_return() and program the MSR whenever we are + * skipping the XRSTORS. We need special precaution to make + * sure the UINV value in the XSTATE buffer doesn't get + * overwritten by calling XSAVES twice. + */ + WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD)); + + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, misc_msr); + if (!(misc_msr & GENMASK_ULL(39, 32))) { + misc_msr |= (u64)UINTR_NOTIFICATION_VECTOR << 32; + wrmsrl(MSR_IA32_UINTR_MISC, misc_msr); + } + + /* + * It is necessary to clear the SN bit after we set UINV and + * NDST to avoid incorrect interrupt routing. + */ + upid = current->thread.ui_recv->upid_ctx->upid; + upid->nc.ndst = cpu_to_ndst(smp_processor_id()); + clear_bit(UPID_SN, (unsigned long *)&upid->nc.status); + + /* + * Interrupts might have accumulated in the UPID while the + * thread was preempted. In this case invoke the hardware + * detection sequence manually by sending a self IPI with UINV. + * Since UINV is set and SN is cleared, any new UINTR + * notifications due to the self IPI or otherwise would result + * in the hardware updating the UIRR directly. + * No real interrupt would be generated as a result of this. + * + * The alternative is to atomically read and clear the UPID and + * program the UIRR. In that case the kernel would need to + * carefully manage the race with the hardware if the UPID gets + * updated after the read. + */ + if (READ_ONCE(upid->puir)) + apic->send_IPI_self(UINTR_NOTIFICATION_VECTOR); + } +} From patchwork Mon Sep 13 20:01:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C58EC4332F for ; Mon, 13 Sep 2021 20:04:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 18D3861107 for ; Mon, 13 Sep 2021 20:04:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348035AbhIMUGK (ORCPT ); Mon, 13 Sep 2021 16:06:10 -0400 Received: from mga05.intel.com ([192.55.52.43]:38692 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347852AbhIMUFx (ORCPT ); Mon, 13 Sep 2021 16:05:53 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336380" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336380" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643925" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:31 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 08/13] x86/process/64: Clean up uintr task fork and exit paths Date: Mon, 13 Sep 2021 13:01:27 -0700 Message-Id: <20210913200132.3396598-9-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The user interrupt MSRs and the user interrupt state is task specific. During task fork and exit clear the task state, clear the MSRs and dereference the shared resources. Some of the memory resources like the UPID are referenced in the file descriptor and could be in use while the uintr_fd is still valid. Instead of freeing up the UPID just dereference it. Eventually when every user releases the reference the memory resource will be freed up. Signed-off-by: Jacob Pan Signed-off-by: Sohil Mehta --- arch/x86/include/asm/uintr.h | 3 ++ arch/x86/kernel/fpu/core.c | 9 ++++++ arch/x86/kernel/process.c | 9 ++++++ arch/x86/kernel/uintr_core.c | 55 ++++++++++++++++++++++++++++++++++++ 4 files changed, 76 insertions(+) diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h index f7ccb67014b8..cef4dd81d40e 100644 --- a/arch/x86/include/asm/uintr.h +++ b/arch/x86/include/asm/uintr.h @@ -8,12 +8,15 @@ bool uintr_arch_enabled(void); int do_uintr_register_handler(u64 handler); int do_uintr_unregister_handler(void); +void uintr_free(struct task_struct *task); + /* TODO: Inline the context switch related functions */ void switch_uintr_prepare(struct task_struct *prev); void switch_uintr_return(void); #else /* !CONFIG_X86_USER_INTERRUPTS */ +static inline void uintr_free(struct task_struct *task) {} static inline void switch_uintr_prepare(struct task_struct *prev) {} static inline void switch_uintr_return(void) {} diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index e30588bf7ce9..c0a54f7aaa2a 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -260,6 +260,7 @@ int fpu_clone(struct task_struct *dst) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; + struct uintr_state *uintr_state; /* The new task's FPU state cannot be valid in the hardware. */ dst_fpu->last_cpu = -1; @@ -284,6 +285,14 @@ int fpu_clone(struct task_struct *dst) else save_fpregs_to_fpstate(dst_fpu); + + /* UINTR state is not expected to be inherited (in the current design). */ + if (static_cpu_has(X86_FEATURE_UINTR)) { + uintr_state = get_xsave_addr(&dst_fpu->state.xsave, XFEATURE_UINTR); + if (uintr_state) + memset(uintr_state, 0, sizeof(*uintr_state)); + } + fpregs_unlock(); set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 1d9463e3096b..83677f76bd7b 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -87,6 +88,12 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) #ifdef CONFIG_VM86 dst->thread.vm86 = NULL; #endif + +#ifdef CONFIG_X86_USER_INTERRUPTS + /* User Interrupt state is unique for each task */ + dst->thread.ui_recv = NULL; +#endif + return fpu_clone(dst); } @@ -103,6 +110,8 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + uintr_free(tsk); + fpu__drop(fpu); } diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c index 7a29888050ad..a2a13f890139 100644 --- a/arch/x86/kernel/uintr_core.c +++ b/arch/x86/kernel/uintr_core.c @@ -313,3 +313,58 @@ void switch_uintr_return(void) apic->send_IPI_self(UINTR_NOTIFICATION_VECTOR); } } + +/* + * This should only be called from exit_thread(). + * exit_thread() can happen in current context when the current thread is + * exiting or it can happen for a new thread that is being created. + * For new threads is_uintr_receiver() should fail. + */ +void uintr_free(struct task_struct *t) +{ + struct uintr_receiver *ui_recv; + struct fpu *fpu; + + if (!static_cpu_has(X86_FEATURE_UINTR) || !is_uintr_receiver(t)) + return; + + if (WARN_ON_ONCE(t != current)) + return; + + fpu = &t->thread.fpu; + + fpregs_lock(); + + if (fpregs_state_valid(fpu, smp_processor_id())) { + wrmsrl(MSR_IA32_UINTR_MISC, 0ULL); + wrmsrl(MSR_IA32_UINTR_PD, 0ULL); + wrmsrl(MSR_IA32_UINTR_RR, 0ULL); + wrmsrl(MSR_IA32_UINTR_STACKADJUST, 0ULL); + wrmsrl(MSR_IA32_UINTR_HANDLER, 0ULL); + } else { + struct uintr_state *p; + + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->handler = 0; + p->uirr = 0; + p->upid_addr = 0; + p->stack_adjust = 0; + p->uinv = 0; + } + } + + /* Check: Can a thread be context switched while it is exiting? */ + ui_recv = t->thread.ui_recv; + + /* + * Suppress notifications so that no further interrupts are + * generated based on this UPID. + */ + set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); + put_upid_ref(ui_recv->upid_ctx); + kfree(ui_recv); + t->thread.ui_recv = NULL; + + fpregs_unlock(); +} From patchwork Mon Sep 13 20:01:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490723 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04A45C433FE for ; Mon, 13 Sep 2021 20:04:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E1805610FE for ; Mon, 13 Sep 2021 20:04:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347997AbhIMUGJ (ORCPT ); Mon, 13 Sep 2021 16:06:09 -0400 Received: from mga05.intel.com ([192.55.52.43]:38697 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347857AbhIMUFx (ORCPT ); Mon, 13 Sep 2021 16:05:53 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336382" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336382" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643928" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:32 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 09/13] x86/uintr: Introduce vector registration and uintr_fd syscall Date: Mon, 13 Sep 2021 13:01:28 -0700 Message-Id: <20210913200132.3396598-10-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Each receiving task has its own interrupt vector space of 64 vectors. For each vector registered by a task create a uintr_fd. Only tasks that have previously registered a user interrupt handler can register a vector. The sender for the user interrupt could be another userspace application, kernel or an external source (like a device). Any sender that wants to generate a user interrupt needs access to receiver's vector number and UPID. uintr_fd abstracts that information and allows a sender with access to uintr_fd to connect and generate a user interrupt. Upon interrupt delivery, the interrupt handler would be invoked with the associated vector number pushed onto the stack. Using an FD abstraction automatically provides a secure mechanism to connect with a receiver. It also makes the tracking and management of the interrupt vector resource easier for userspace. uintr_fd can be useful in some of the usages where eventfd is used for userspace event notifications. Though uintr_fd is nowhere close to a drop-in replacement, the semantics are meant to be somewhat similar to an eventfd or the write end of a pipe. Access to uintr_fd can be achieved in the following ways: - direct access if the task is part of the same thread group (process) - inherited by a child process. - explicitly shared using any of the FD sharing mechanisms. If the sender is another userspace task, it can use the uintr_fd to send user IPIs to the receiver. This works in conjunction with the SENDUIPI instruction. The details related to this are covered later. The exact APIs for the sender being a kernel or another external source are still being worked upon. The general idea is that the receiver would pass the uintr_fd to the kernel by extending some existing API (like io_uring). The vector associated with uintr_fd can be unregistered by closing all references to the uintr_fd. Signed-off-by: Sohil Mehta --- arch/x86/include/asm/uintr.h | 14 ++++ arch/x86/kernel/uintr_core.c | 129 +++++++++++++++++++++++++++++++++-- arch/x86/kernel/uintr_fd.c | 94 +++++++++++++++++++++++++ 3 files changed, 232 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h index cef4dd81d40e..1f00e2a63da4 100644 --- a/arch/x86/include/asm/uintr.h +++ b/arch/x86/include/asm/uintr.h @@ -4,9 +4,23 @@ #ifdef CONFIG_X86_USER_INTERRUPTS +struct uintr_upid_ctx { + struct task_struct *task; /* Receiver task */ + struct uintr_upid *upid; + refcount_t refs; +}; + +struct uintr_receiver_info { + struct uintr_upid_ctx *upid_ctx; /* UPID context */ + struct callback_head twork; /* Task work head */ + u64 uvec; /* Vector number */ +}; + bool uintr_arch_enabled(void); int do_uintr_register_handler(u64 handler); int do_uintr_unregister_handler(void); +int do_uintr_register_vector(struct uintr_receiver_info *r_info); +void do_uintr_unregister_vector(struct uintr_receiver_info *r_info); void uintr_free(struct task_struct *task); diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c index a2a13f890139..9dcb9f60e5bc 100644 --- a/arch/x86/kernel/uintr_core.c +++ b/arch/x86/kernel/uintr_core.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -20,6 +21,8 @@ #include #include +#define UINTR_MAX_UVEC_NR 64 + /* User Posted Interrupt Descriptor (UPID) */ struct uintr_upid { struct { @@ -36,13 +39,9 @@ struct uintr_upid { #define UPID_ON 0x0 /* Outstanding notification */ #define UPID_SN 0x1 /* Suppressed notification */ -struct uintr_upid_ctx { - struct uintr_upid *upid; - refcount_t refs; -}; - struct uintr_receiver { struct uintr_upid_ctx *upid_ctx; + u64 uvec_mask; /* track active vector per bit */ }; inline bool uintr_arch_enabled(void) @@ -69,6 +68,7 @@ static inline u32 cpu_to_ndst(int cpu) static void free_upid(struct uintr_upid_ctx *upid_ctx) { + put_task_struct(upid_ctx->task); kfree(upid_ctx->upid); upid_ctx->upid = NULL; kfree(upid_ctx); @@ -93,6 +93,7 @@ static struct uintr_upid_ctx *alloc_upid(void) upid_ctx->upid = upid; refcount_set(&upid_ctx->refs, 1); + upid_ctx->task = get_task_struct(current); return upid_ctx; } @@ -103,6 +104,77 @@ static void put_upid_ref(struct uintr_upid_ctx *upid_ctx) free_upid(upid_ctx); } +static struct uintr_upid_ctx *get_upid_ref(struct uintr_upid_ctx *upid_ctx) +{ + refcount_inc(&upid_ctx->refs); + return upid_ctx; +} + +static void __clear_vector_from_upid(u64 uvec, struct uintr_upid *upid) +{ + clear_bit(uvec, (unsigned long *)&upid->puir); +} + +static void __clear_vector_from_task(u64 uvec) +{ + struct task_struct *t = current; + + pr_debug("recv: task=%d free vector %llu\n", t->pid, uvec); + + if (!(BIT_ULL(uvec) & t->thread.ui_recv->uvec_mask)) + return; + + clear_bit(uvec, (unsigned long *)&t->thread.ui_recv->uvec_mask); + + if (!t->thread.ui_recv->uvec_mask) + pr_debug("recv: task=%d unregistered all user vectors\n", t->pid); +} + +/* Callback to clear the vector structures when a vector is unregistered. */ +static void receiver_clear_uvec(struct callback_head *head) +{ + struct uintr_receiver_info *r_info; + struct uintr_upid_ctx *upid_ctx; + struct task_struct *t = current; + u64 uvec; + + r_info = container_of(head, struct uintr_receiver_info, twork); + uvec = r_info->uvec; + upid_ctx = r_info->upid_ctx; + + /* + * If a task has unregistered the interrupt handler the vector + * structures would have already been cleared. + */ + if (is_uintr_receiver(t)) { + /* + * The UPID context in the callback might differ from the one + * on the task if the task unregistered its interrupt handler + * and then registered itself again. The vector structures + * related to the previous UPID would have already been cleared + * in that case. + */ + if (t->thread.ui_recv->upid_ctx != upid_ctx) { + pr_debug("recv: task %d is now using a different UPID\n", + t->pid); + goto out_free; + } + + /* + * If the vector has been recognized in the UIRR don't modify + * it. We need to disable User Interrupts before modifying the + * UIRR. It might be better to just let that interrupt get + * delivered. + */ + __clear_vector_from_upid(uvec, upid_ctx->upid); + __clear_vector_from_task(uvec); + } + +out_free: + put_upid_ref(upid_ctx); + kfree(r_info); +} + int do_uintr_unregister_handler(void) { struct task_struct *t = current; @@ -239,6 +311,53 @@ int do_uintr_register_handler(u64 handler) return 0; } +void do_uintr_unregister_vector(struct uintr_receiver_info *r_info) +{ + int ret; + + pr_debug("recv: Adding task work to clear vector %llu added for task=%d\n", + r_info->uvec, r_info->upid_ctx->task->pid); + + init_task_work(&r_info->twork, receiver_clear_uvec); + ret = task_work_add(r_info->upid_ctx->task, &r_info->twork, true); + if (ret) { + pr_debug("recv: Clear vector task=%d has already exited\n", + r_info->upid_ctx->task->pid); + put_upid_ref(r_info->upid_ctx); + kfree(r_info); + return; + } +} + +int do_uintr_register_vector(struct uintr_receiver_info *r_info) +{ + struct uintr_receiver *ui_recv; + struct task_struct *t = current; + + /* + * A vector should only be registered by a task that + * has an interrupt handler registered. + */ + if (!is_uintr_receiver(t)) + return -EINVAL; + + if (r_info->uvec >= UINTR_MAX_UVEC_NR) + return -ENOSPC; + + ui_recv = t->thread.ui_recv; + + if (ui_recv->uvec_mask & BIT_ULL(r_info->uvec)) + return -EBUSY; + + ui_recv->uvec_mask |= BIT_ULL(r_info->uvec); + pr_debug("recv: task %d new uvec=%llu, new mask %llx\n", + t->pid, r_info->uvec, ui_recv->uvec_mask); + + r_info->upid_ctx = get_upid_ref(ui_recv->upid_ctx); + + return 0; +} + /* Suppress notifications since this task is being context switched out */ void switch_uintr_prepare(struct task_struct *prev) { diff --git a/arch/x86/kernel/uintr_fd.c b/arch/x86/kernel/uintr_fd.c index a1a9c105fdab..f0548bbac776 100644 --- a/arch/x86/kernel/uintr_fd.c +++ b/arch/x86/kernel/uintr_fd.c @@ -6,11 +6,105 @@ */ #define pr_fmt(fmt) "uintr: " fmt +#include +#include #include #include #include +struct uintrfd_ctx { + struct uintr_receiver_info *r_info; +}; + +#ifdef CONFIG_PROC_FS +static void uintrfd_show_fdinfo(struct seq_file *m, struct file *file) +{ + struct uintrfd_ctx *uintrfd_ctx = file->private_data; + + /* Check: Should we print the receiver and sender info here? */ + seq_printf(m, "user_vector:%llu\n", uintrfd_ctx->r_info->uvec); +} +#endif + +static int uintrfd_release(struct inode *inode, struct file *file) +{ + struct uintrfd_ctx *uintrfd_ctx = file->private_data; + + pr_debug("recv: Release uintrfd for r_task %d uvec %llu\n", + uintrfd_ctx->r_info->upid_ctx->task->pid, + uintrfd_ctx->r_info->uvec); + + do_uintr_unregister_vector(uintrfd_ctx->r_info); + kfree(uintrfd_ctx); + + return 0; +} + +static const struct file_operations uintrfd_fops = { +#ifdef CONFIG_PROC_FS + .show_fdinfo = uintrfd_show_fdinfo, +#endif + .release = uintrfd_release, + .llseek = noop_llseek, +}; + +/* + * sys_uintr_create_fd - Create a uintr_fd for the registered interrupt vector. + */ +SYSCALL_DEFINE2(uintr_create_fd, u64, vector, unsigned int, flags) +{ + struct uintrfd_ctx *uintrfd_ctx; + int uintrfd; + int ret; + + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + uintrfd_ctx = kzalloc(sizeof(*uintrfd_ctx), GFP_KERNEL); + if (!uintrfd_ctx) + return -ENOMEM; + + uintrfd_ctx->r_info = kzalloc(sizeof(*uintrfd_ctx->r_info), GFP_KERNEL); + if (!uintrfd_ctx->r_info) { + ret = -ENOMEM; + goto out_free_ctx; + } + + uintrfd_ctx->r_info->uvec = vector; + ret = do_uintr_register_vector(uintrfd_ctx->r_info); + if (ret) { + kfree(uintrfd_ctx->r_info); + goto out_free_ctx; + } + + /* TODO: Get user input for flags - UFD_CLOEXEC */ + /* Check: Do we need O_NONBLOCK? */ + uintrfd = anon_inode_getfd("[uintrfd]", &uintrfd_fops, uintrfd_ctx, + O_RDONLY | O_CLOEXEC | O_NONBLOCK); + + if (uintrfd < 0) { + ret = uintrfd; + goto out_free_uvec; + } + + pr_debug("recv: Alloc vector success uintrfd %d uvec %llu for task=%d\n", + uintrfd, uintrfd_ctx->r_info->uvec, current->pid); + + return uintrfd; + +out_free_uvec: + do_uintr_unregister_vector(uintrfd_ctx->r_info); +out_free_ctx: + kfree(uintrfd_ctx); + pr_debug("recv: Alloc vector failed for task=%d ret %d\n", + current->pid, ret); + return ret; +} + /* * sys_uintr_register_handler - setup user interrupt handler for receiver. */ From patchwork Mon Sep 13 20:01:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490729 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34833C4332F for ; Mon, 13 Sep 2021 20:05:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 20DE161106 for ; Mon, 13 Sep 2021 20:05:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348142AbhIMUG3 (ORCPT ); Mon, 13 Sep 2021 16:06:29 -0400 Received: from mga05.intel.com ([192.55.52.43]:38689 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347968AbhIMUGH (ORCPT ); Mon, 13 Sep 2021 16:06:07 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336385" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336385" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643931" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:32 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 10/13] x86/uintr: Introduce user IPI sender syscalls Date: Mon, 13 Sep 2021 13:01:29 -0700 Message-Id: <20210913200132.3396598-11-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Add a registration syscall for a task to register itself as a user interrupt sender using the uintr_fd generated by the receiver. A task can register multiple uintr_fds. Each unique successful connection creates a new entry in the User Interrupt Target Table (UITT). Each entry in the UITT table is referred by the UITT index (uipi_index). The uipi_index returned during the registration syscall lets a sender generate a user IPI using the 'SENDUIPI ' instruction. Also, add a sender unregister syscall to unregister a particular task from the uintr_fd. Calling close on the uintr_fd will disconnect all threads in a sender process from that FD. Currently, the UITT size is arbitrarily chosen as 256 entries corresponding to a 4KB page. Based on feedback and usage data this can either be increased/decreased or made dynamic later. Architecturally, the UITT table can be unique for each thread or shared across threads of the same thread group. The current implementation keeps the UITT as unique for the each thread. This makes the kernel implementation relatively simple and only threads that use uintr get setup with the related structures. However, this means that the uipi_index for each thread would be inconsistent wrt to other threads. (Executing 'SENDUIPI 2' on threads of the same process could generate different user interrupts.) Alternatively, the benefit of sharing the UITT table is that all threads would see the same view of the UITT table. Also the kernel UITT memory allocation would be more efficient if multiple threads connect to the same uintr_fd. However, this would mean the kernel needs to keep the UITT table size MISC_MSR[] in sync across these threads. Also the UPID/UITT teardown flows might need additional consideration. Signed-off-by: Sohil Mehta --- arch/x86/include/asm/processor.h | 2 + arch/x86/include/asm/uintr.h | 15 ++ arch/x86/kernel/process.c | 1 + arch/x86/kernel/uintr_core.c | 355 ++++++++++++++++++++++++++++++- arch/x86/kernel/uintr_fd.c | 133 ++++++++++++ 5 files changed, 495 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index d229bfac8b4f..3482c3182e39 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -10,6 +10,7 @@ struct mm_struct; struct io_bitmap; struct vm86; struct uintr_receiver; +struct uintr_sender; #include #include @@ -533,6 +534,7 @@ struct thread_struct { #ifdef CONFIG_X86_USER_INTERRUPTS /* User Interrupt state*/ struct uintr_receiver *ui_recv; + struct uintr_sender *ui_send; #endif /* Floating point and extended processor state */ diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h index 1f00e2a63da4..ef3521dd7fb9 100644 --- a/arch/x86/include/asm/uintr.h +++ b/arch/x86/include/asm/uintr.h @@ -8,6 +8,7 @@ struct uintr_upid_ctx { struct task_struct *task; /* Receiver task */ struct uintr_upid *upid; refcount_t refs; + bool receiver_active; /* Flag for UPID being mapped to a receiver */ }; struct uintr_receiver_info { @@ -16,12 +17,26 @@ struct uintr_receiver_info { u64 uvec; /* Vector number */ }; +struct uintr_sender_info { + struct list_head node; + struct uintr_uitt_ctx *uitt_ctx; + struct task_struct *task; + struct uintr_upid_ctx *r_upid_ctx; /* Receiver's UPID context */ + struct callback_head twork; /* Task work head */ + unsigned int uitt_index; +}; + bool uintr_arch_enabled(void); int do_uintr_register_handler(u64 handler); int do_uintr_unregister_handler(void); int do_uintr_register_vector(struct uintr_receiver_info *r_info); void do_uintr_unregister_vector(struct uintr_receiver_info *r_info); +int do_uintr_register_sender(struct uintr_receiver_info *r_info, + struct uintr_sender_info *s_info); +void do_uintr_unregister_sender(struct uintr_receiver_info *r_info, + struct uintr_sender_info *s_info); + void uintr_free(struct task_struct *task); /* TODO: Inline the context switch related functions */ diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 83677f76bd7b..9db33e467b30 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -92,6 +92,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) #ifdef CONFIG_X86_USER_INTERRUPTS /* User Interrupt state is unique for each task */ dst->thread.ui_recv = NULL; + dst->thread.ui_send = NULL; #endif return fpu_clone(dst); diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c index 9dcb9f60e5bc..8f331c5fe0cf 100644 --- a/arch/x86/kernel/uintr_core.c +++ b/arch/x86/kernel/uintr_core.c @@ -21,6 +21,11 @@ #include #include +/* + * Each UITT entry is 16 bytes in size. + * Current UITT table size is set as 4KB (256 * 16 bytes) + */ +#define UINTR_MAX_UITT_NR 256 #define UINTR_MAX_UVEC_NR 64 /* User Posted Interrupt Descriptor (UPID) */ @@ -44,6 +49,27 @@ struct uintr_receiver { u64 uvec_mask; /* track active vector per bit */ }; +/* User Interrupt Target Table Entry (UITTE) */ +struct uintr_uitt_entry { + u8 valid; /* bit 0: valid, bit 1-7: reserved */ + u8 user_vec; + u8 reserved[6]; + u64 target_upid_addr; +} __packed __aligned(16); + +struct uintr_uitt_ctx { + struct uintr_uitt_entry *uitt; + /* Protect UITT */ + spinlock_t uitt_lock; + refcount_t refs; +}; + +struct uintr_sender { + struct uintr_uitt_ctx *uitt_ctx; + /* track active uitt entries per bit */ + u64 uitt_mask[BITS_TO_U64(UINTR_MAX_UITT_NR)]; +}; + inline bool uintr_arch_enabled(void) { return static_cpu_has(X86_FEATURE_UINTR); @@ -54,6 +80,36 @@ static inline bool is_uintr_receiver(struct task_struct *t) return !!t->thread.ui_recv; } +static inline bool is_uintr_sender(struct task_struct *t) +{ + return !!t->thread.ui_send; +} + +static inline bool is_uintr_task(struct task_struct *t) +{ + return(is_uintr_receiver(t) || is_uintr_sender(t)); +} + +static inline bool is_uitt_empty(struct task_struct *t) +{ + return !!bitmap_empty((unsigned long *)t->thread.ui_send->uitt_mask, + UINTR_MAX_UITT_NR); +} + +/* + * No lock is needed to read the active flag. Writes only happen from + * r_info->task that owns the UPID. Everyone else would just read this flag. + * + * This only provides a static check. The receiver may become inactive right + * after this check. The primary reason to have this check is to prevent future + * senders from connecting with this UPID, since the receiver task has already + * made this UPID inactive. + */ +static bool uintr_is_receiver_active(struct uintr_receiver_info *r_info) +{ + return r_info->upid_ctx->receiver_active; +} + static inline u32 cpu_to_ndst(int cpu) { u32 apicid = (u32)apic->cpu_present_to_apicid(cpu); @@ -94,6 +150,7 @@ static struct uintr_upid_ctx *alloc_upid(void) upid_ctx->upid = upid; refcount_set(&upid_ctx->refs, 1); upid_ctx->task = get_task_struct(current); + upid_ctx->receiver_active = true; return upid_ctx; } @@ -110,6 +167,64 @@ static struct uintr_upid_ctx *get_upid_ref(struct uintr_upid_ctx *upid_ctx) return upid_ctx; } +static void free_uitt(struct uintr_uitt_ctx *uitt_ctx) +{ + unsigned long flags; + + spin_lock_irqsave(&uitt_ctx->uitt_lock, flags); + kfree(uitt_ctx->uitt); + uitt_ctx->uitt = NULL; + spin_unlock_irqrestore(&uitt_ctx->uitt_lock, flags); + + kfree(uitt_ctx); +} + +/* TODO: Replace UITT allocation with KPTI compatible memory allocator */ +static struct uintr_uitt_ctx *alloc_uitt(void) +{ + struct uintr_uitt_ctx *uitt_ctx; + struct uintr_uitt_entry *uitt; + + uitt_ctx = kzalloc(sizeof(*uitt_ctx), GFP_KERNEL); + if (!uitt_ctx) + return NULL; + + uitt = kzalloc(sizeof(*uitt) * UINTR_MAX_UITT_NR, GFP_KERNEL); + if (!uitt) { + kfree(uitt_ctx); + return NULL; + } + + uitt_ctx->uitt = uitt; + spin_lock_init(&uitt_ctx->uitt_lock); + refcount_set(&uitt_ctx->refs, 1); + + return uitt_ctx; +} + +static void put_uitt_ref(struct uintr_uitt_ctx *uitt_ctx) +{ + if (refcount_dec_and_test(&uitt_ctx->refs)) + free_uitt(uitt_ctx); +} + +static struct uintr_uitt_ctx *get_uitt_ref(struct uintr_uitt_ctx *uitt_ctx) +{ + refcount_inc(&uitt_ctx->refs); + return uitt_ctx; +} + +static inline void mark_uitte_invalid(struct uintr_sender_info *s_info) +{ + struct uintr_uitt_entry *uitte; + unsigned long flags; + + spin_lock_irqsave(&s_info->uitt_ctx->uitt_lock, flags); + uitte = &s_info->uitt_ctx->uitt[s_info->uitt_index]; + uitte->valid = 0; + spin_unlock_irqrestore(&s_info->uitt_ctx->uitt_lock, flags); +} + static void __clear_vector_from_upid(u64 uvec, struct uintr_upid *upid) { clear_bit(uvec, (unsigned long *)&upid->puir); @@ -175,6 +290,210 @@ static void receiver_clear_uvec(struct callback_head *head) kfree(r_info); } +static void teardown_uitt(void) +{ + struct task_struct *t = current; + struct fpu *fpu = &t->thread.fpu; + u64 msr64; + + put_uitt_ref(t->thread.ui_send->uitt_ctx); + kfree(t->thread.ui_send); + t->thread.ui_send = NULL; + + fpregs_lock(); + + if (fpregs_state_valid(fpu, smp_processor_id())) { + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, msr64); + msr64 &= GENMASK_ULL(63, 32); + wrmsrl(MSR_IA32_UINTR_MISC, msr64); + wrmsrl(MSR_IA32_UINTR_TT, 0ULL); + } else { + struct uintr_state *p; + + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->uitt_size = 0; + p->uitt_addr = 0; + } + } + + fpregs_unlock(); +} + +static int init_uitt(void) +{ + struct task_struct *t = current; + struct fpu *fpu = &t->thread.fpu; + struct uintr_sender *ui_send; + u64 msr64; + + ui_send = kzalloc(sizeof(*t->thread.ui_send), GFP_KERNEL); + if (!ui_send) + return -ENOMEM; + + ui_send->uitt_ctx = alloc_uitt(); + if (!ui_send->uitt_ctx) { + pr_debug("send: Alloc UITT failed for task=%d\n", t->pid); + kfree(ui_send); + return -ENOMEM; + } + + fpregs_lock(); + + if (fpregs_state_valid(fpu, smp_processor_id())) { + wrmsrl(MSR_IA32_UINTR_TT, (u64)ui_send->uitt_ctx->uitt | 1); + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, msr64); + msr64 &= GENMASK_ULL(63, 32); + msr64 |= UINTR_MAX_UITT_NR; + wrmsrl(MSR_IA32_UINTR_MISC, msr64); + } else { + struct xregs_state *xsave; + struct uintr_state *p; + + xsave = &fpu->state.xsave; + xsave->header.xfeatures |= XFEATURE_MASK_UINTR; + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->uitt_size = UINTR_MAX_UITT_NR; + p->uitt_addr = (u64)ui_send->uitt_ctx->uitt | 1; + } + } + + fpregs_unlock(); + + pr_debug("send: Setup a new UITT=%px for task=%d with size %d\n", + ui_send->uitt_ctx->uitt, t->pid, UINTR_MAX_UITT_NR * 16); + + t->thread.ui_send = ui_send; + + return 0; +} + +static void __free_uitt_entry(unsigned int entry) +{ + struct task_struct *t = current; + unsigned long flags; + + if (entry >= UINTR_MAX_UITT_NR) + return; + + if (!is_uintr_sender(t)) + return; + + pr_debug("send: Freeing UITTE entry %d for task=%d\n", entry, t->pid); + + spin_lock_irqsave(&t->thread.ui_send->uitt_ctx->uitt_lock, flags); + memset(&t->thread.ui_send->uitt_ctx->uitt[entry], 0, + sizeof(struct uintr_uitt_entry)); + spin_unlock_irqrestore(&t->thread.ui_send->uitt_ctx->uitt_lock, flags); + + clear_bit(entry, (unsigned long *)t->thread.ui_send->uitt_mask); + + if (is_uitt_empty(t)) { + pr_debug("send: UITT mask is empty. Dereference and teardown UITT\n"); + teardown_uitt(); + } +} + +static void sender_free_uitte(struct callback_head *head) +{ + struct uintr_sender_info *s_info; + + s_info = container_of(head, struct uintr_sender_info, twork); + + __free_uitt_entry(s_info->uitt_index); + put_uitt_ref(s_info->uitt_ctx); + put_upid_ref(s_info->r_upid_ctx); + put_task_struct(s_info->task); + kfree(s_info); +} + +void do_uintr_unregister_sender(struct uintr_receiver_info *r_info, + struct uintr_sender_info *s_info) +{ + int ret; + + /* + * To make sure any new senduipi result in a #GP fault. + * The task work might take non-zero time to kick the process out. + */ + mark_uitte_invalid(s_info); + + pr_debug("send: Adding Free UITTE %d task work for task=%d\n", + s_info->uitt_index, s_info->task->pid); + + init_task_work(&s_info->twork, sender_free_uitte); + ret = task_work_add(s_info->task, &s_info->twork, true); + if (ret) { + /* + * Dereferencing the UITT and UPID here since the task has + * exited. + */ + pr_debug("send: Free UITTE %d task=%d has already exited\n", + s_info->uitt_index, s_info->task->pid); + put_upid_ref(s_info->r_upid_ctx); + put_uitt_ref(s_info->uitt_ctx); + put_task_struct(s_info->task); + kfree(s_info); + return; + } +} + +int do_uintr_register_sender(struct uintr_receiver_info *r_info, + struct uintr_sender_info *s_info) +{ + struct uintr_uitt_entry *uitte = NULL; + struct uintr_sender *ui_send; + struct task_struct *t = current; + unsigned long flags; + int entry; + int ret; + + /* + * Only a static check. Receiver could exit anytime after this check. + * This check only prevents connections using uintr_fd after the + * receiver has already exited/unregistered. + */ + if (!uintr_is_receiver_active(r_info)) + return -ESHUTDOWN; + + if (is_uintr_sender(t)) { + entry = find_first_zero_bit((unsigned long *)t->thread.ui_send->uitt_mask, + UINTR_MAX_UITT_NR); + if (entry >= UINTR_MAX_UITT_NR) + return -ENOSPC; + } else { + BUILD_BUG_ON(UINTR_MAX_UITT_NR < 1); + entry = 0; + ret = init_uitt(); + if (ret) + return ret; + } + + ui_send = t->thread.ui_send; + + set_bit(entry, (unsigned long *)ui_send->uitt_mask); + + spin_lock_irqsave(&ui_send->uitt_ctx->uitt_lock, flags); + uitte = &ui_send->uitt_ctx->uitt[entry]; + pr_debug("send: sender=%d receiver=%d UITTE entry %d address %px\n", + current->pid, r_info->upid_ctx->task->pid, entry, uitte); + + uitte->user_vec = r_info->uvec; + uitte->target_upid_addr = (u64)r_info->upid_ctx->upid; + uitte->valid = 1; + spin_unlock_irqrestore(&ui_send->uitt_ctx->uitt_lock, flags); + + s_info->r_upid_ctx = get_upid_ref(r_info->upid_ctx); + s_info->uitt_ctx = get_uitt_ref(ui_send->uitt_ctx); + s_info->task = get_task_struct(current); + s_info->uitt_index = entry; + + return 0; +} + int do_uintr_unregister_handler(void) { struct task_struct *t = current; @@ -222,6 +541,8 @@ int do_uintr_unregister_handler(void) } ui_recv = t->thread.ui_recv; + ui_recv->upid_ctx->receiver_active = false; + /* * Suppress notifications so that no further interrupts are generated * based on this UPID. @@ -437,14 +758,14 @@ void switch_uintr_return(void) * This should only be called from exit_thread(). * exit_thread() can happen in current context when the current thread is * exiting or it can happen for a new thread that is being created. - * For new threads is_uintr_receiver() should fail. + * For new threads is_uintr_task() should fail. */ void uintr_free(struct task_struct *t) { struct uintr_receiver *ui_recv; struct fpu *fpu; - if (!static_cpu_has(X86_FEATURE_UINTR) || !is_uintr_receiver(t)) + if (!static_cpu_has(X86_FEATURE_UINTR) || !is_uintr_task(t)) return; if (WARN_ON_ONCE(t != current)) @@ -456,6 +777,7 @@ void uintr_free(struct task_struct *t) if (fpregs_state_valid(fpu, smp_processor_id())) { wrmsrl(MSR_IA32_UINTR_MISC, 0ULL); + wrmsrl(MSR_IA32_UINTR_TT, 0ULL); wrmsrl(MSR_IA32_UINTR_PD, 0ULL); wrmsrl(MSR_IA32_UINTR_RR, 0ULL); wrmsrl(MSR_IA32_UINTR_STACKADJUST, 0ULL); @@ -470,20 +792,31 @@ void uintr_free(struct task_struct *t) p->upid_addr = 0; p->stack_adjust = 0; p->uinv = 0; + p->uitt_addr = 0; + p->uitt_size = 0; } } /* Check: Can a thread be context switched while it is exiting? */ - ui_recv = t->thread.ui_recv; + if (is_uintr_receiver(t)) { + ui_recv = t->thread.ui_recv; - /* - * Suppress notifications so that no further interrupts are - * generated based on this UPID. - */ - set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); - put_upid_ref(ui_recv->upid_ctx); - kfree(ui_recv); - t->thread.ui_recv = NULL; + /* + * Suppress notifications so that no further interrupts are + * generated based on this UPID. + */ + set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); + ui_recv->upid_ctx->receiver_active = false; + put_upid_ref(ui_recv->upid_ctx); + kfree(ui_recv); + t->thread.ui_recv = NULL; + } fpregs_unlock(); + + if (is_uintr_sender(t)) { + put_uitt_ref(t->thread.ui_send->uitt_ctx); + kfree(t->thread.ui_send); + t->thread.ui_send = NULL; + } } diff --git a/arch/x86/kernel/uintr_fd.c b/arch/x86/kernel/uintr_fd.c index f0548bbac776..3c82c032c0b9 100644 --- a/arch/x86/kernel/uintr_fd.c +++ b/arch/x86/kernel/uintr_fd.c @@ -15,6 +15,9 @@ struct uintrfd_ctx { struct uintr_receiver_info *r_info; + /* Protect sender_list */ + spinlock_t sender_lock; + struct list_head sender_list; }; #ifdef CONFIG_PROC_FS @@ -30,11 +33,20 @@ static void uintrfd_show_fdinfo(struct seq_file *m, struct file *file) static int uintrfd_release(struct inode *inode, struct file *file) { struct uintrfd_ctx *uintrfd_ctx = file->private_data; + struct uintr_sender_info *s_info, *tmp; + unsigned long flags; pr_debug("recv: Release uintrfd for r_task %d uvec %llu\n", uintrfd_ctx->r_info->upid_ctx->task->pid, uintrfd_ctx->r_info->uvec); + spin_lock_irqsave(&uintrfd_ctx->sender_lock, flags); + list_for_each_entry_safe(s_info, tmp, &uintrfd_ctx->sender_list, node) { + list_del(&s_info->node); + do_uintr_unregister_sender(uintrfd_ctx->r_info, s_info); + } + spin_unlock_irqrestore(&uintrfd_ctx->sender_lock, flags); + do_uintr_unregister_vector(uintrfd_ctx->r_info); kfree(uintrfd_ctx); @@ -81,6 +93,9 @@ SYSCALL_DEFINE2(uintr_create_fd, u64, vector, unsigned int, flags) goto out_free_ctx; } + INIT_LIST_HEAD(&uintrfd_ctx->sender_list); + spin_lock_init(&uintrfd_ctx->sender_lock); + /* TODO: Get user input for flags - UFD_CLOEXEC */ /* Check: Do we need O_NONBLOCK? */ uintrfd = anon_inode_getfd("[uintrfd]", &uintrfd_fops, uintrfd_ctx, @@ -150,3 +165,121 @@ SYSCALL_DEFINE1(uintr_unregister_handler, unsigned int, flags) return ret; } + +/* + * sys_uintr_register_sender - setup user inter-processor interrupt sender. + */ +SYSCALL_DEFINE2(uintr_register_sender, int, uintrfd, unsigned int, flags) +{ + struct uintr_sender_info *s_info; + struct uintrfd_ctx *uintrfd_ctx; + unsigned long lock_flags; + struct file *uintr_f; + struct fd f; + int ret = 0; + + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + f = fdget(uintrfd); + uintr_f = f.file; + if (!uintr_f) + return -EBADF; + + if (uintr_f->f_op != &uintrfd_fops) { + ret = -EOPNOTSUPP; + goto out_fdput; + } + + uintrfd_ctx = (struct uintrfd_ctx *)uintr_f->private_data; + + spin_lock_irqsave(&uintrfd_ctx->sender_lock, lock_flags); + list_for_each_entry(s_info, &uintrfd_ctx->sender_list, node) { + if (s_info->task == current) { + ret = -EISCONN; + break; + } + } + spin_unlock_irqrestore(&uintrfd_ctx->sender_lock, lock_flags); + + if (ret) + goto out_fdput; + + s_info = kzalloc(sizeof(*s_info), GFP_KERNEL); + if (!s_info) { + ret = -ENOMEM; + goto out_fdput; + } + + ret = do_uintr_register_sender(uintrfd_ctx->r_info, s_info); + if (ret) { + kfree(s_info); + goto out_fdput; + } + + spin_lock_irqsave(&uintrfd_ctx->sender_lock, lock_flags); + list_add(&s_info->node, &uintrfd_ctx->sender_list); + spin_unlock_irqrestore(&uintrfd_ctx->sender_lock, lock_flags); + + ret = s_info->uitt_index; + +out_fdput: + pr_debug("send: register sender task=%d flags %d ret(uipi_id)=%d\n", + current->pid, flags, ret); + + fdput(f); + return ret; +} + +/* + * sys_uintr_unregister_sender - Unregister user inter-processor interrupt sender. + */ +SYSCALL_DEFINE2(uintr_unregister_sender, int, uintrfd, unsigned int, flags) +{ + struct uintr_sender_info *s_info; + struct uintrfd_ctx *uintrfd_ctx; + struct file *uintr_f; + unsigned long lock_flags; + struct fd f; + int ret; + + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + f = fdget(uintrfd); + uintr_f = f.file; + if (!uintr_f) + return -EBADF; + + if (uintr_f->f_op != &uintrfd_fops) { + ret = -EOPNOTSUPP; + goto out_fdput; + } + + uintrfd_ctx = (struct uintrfd_ctx *)uintr_f->private_data; + + ret = -EINVAL; + spin_lock_irqsave(&uintrfd_ctx->sender_lock, lock_flags); + list_for_each_entry(s_info, &uintrfd_ctx->sender_list, node) { + if (s_info->task == current) { + ret = 0; + list_del(&s_info->node); + do_uintr_unregister_sender(uintrfd_ctx->r_info, s_info); + break; + } + } + spin_unlock_irqrestore(&uintrfd_ctx->sender_lock, lock_flags); + + pr_debug("send: unregister sender uintrfd %d for task=%d ret %d\n", + uintrfd, current->pid, ret); + +out_fdput: + fdput(f); + return ret; +} From patchwork Mon Sep 13 20:01:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490731 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 152F3C4321E for ; Mon, 13 Sep 2021 20:05:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 03560610A6 for ; Mon, 13 Sep 2021 20:05:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347969AbhIMUGe (ORCPT ); Mon, 13 Sep 2021 16:06:34 -0400 Received: from mga05.intel.com ([192.55.52.43]:38697 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347974AbhIMUGH (ORCPT ); Mon, 13 Sep 2021 16:06:07 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336390" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336390" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643935" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:32 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 11/13] x86/uintr: Introduce uintr_wait() syscall Date: Mon, 13 Sep 2021 13:01:30 -0700 Message-Id: <20210913200132.3396598-12-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Add a new system call to allow applications to block in the kernel and wait for user interrupts. When the application makes this syscall the notification vector is switched to a new kernel vector. Any new SENDUIPI will invoke the kernel interrupt which is then used to wake up the process. Currently, the task wait list is global one. To make the implementation scalable there is a need to move to a distributed per-cpu wait list. Signed-off-by: Sohil Mehta --- arch/x86/include/asm/hardirq.h | 1 + arch/x86/include/asm/idtentry.h | 1 + arch/x86/include/asm/irq_vectors.h | 3 +- arch/x86/include/asm/uintr.h | 22 +++++++ arch/x86/kernel/idt.c | 1 + arch/x86/kernel/irq.c | 18 ++++++ arch/x86/kernel/uintr_core.c | 94 ++++++++++++++++++++++++------ arch/x86/kernel/uintr_fd.c | 15 +++++ 8 files changed, 136 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index 279afc01f1ac..a4623fdb65a1 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -22,6 +22,7 @@ typedef struct { #endif #ifdef CONFIG_X86_USER_INTERRUPTS unsigned int uintr_spurious_count; + unsigned int uintr_kernel_notifications; #endif unsigned int x86_platform_ipis; /* arch dependent */ unsigned int apic_perf_irqs; diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 5929a6f9eeee..0ac7ef592283 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -673,6 +673,7 @@ DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_NESTED_VECTOR, sysvec_kvm_posted_intr_nested #ifdef CONFIG_X86_USER_INTERRUPTS DECLARE_IDTENTRY_SYSVEC(UINTR_NOTIFICATION_VECTOR, sysvec_uintr_spurious_interrupt); +DECLARE_IDTENTRY_SYSVEC(UINTR_KERNEL_VECTOR, sysvec_uintr_kernel_notification); #endif #if IS_ENABLED(CONFIG_HYPERV) diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h index d26faa504931..1d289b3ee0da 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -106,8 +106,9 @@ /* Vector for User interrupt notifications */ #define UINTR_NOTIFICATION_VECTOR 0xec +#define UINTR_KERNEL_VECTOR 0xeb -#define LOCAL_TIMER_VECTOR 0xeb +#define LOCAL_TIMER_VECTOR 0xea #define NR_VECTORS 256 diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h index ef3521dd7fb9..64113ef523ca 100644 --- a/arch/x86/include/asm/uintr.h +++ b/arch/x86/include/asm/uintr.h @@ -4,11 +4,29 @@ #ifdef CONFIG_X86_USER_INTERRUPTS +/* User Posted Interrupt Descriptor (UPID) */ +struct uintr_upid { + struct { + u8 status; /* bit 0: ON, bit 1: SN, bit 2-7: reserved */ + u8 reserved1; /* Reserved */ + u8 nv; /* Notification vector */ + u8 reserved2; /* Reserved */ + u32 ndst; /* Notification destination */ + } nc __packed; /* Notification control */ + u64 puir; /* Posted user interrupt requests */ +} __aligned(64); + +/* UPID Notification control status */ +#define UPID_ON 0x0 /* Outstanding notification */ +#define UPID_SN 0x1 /* Suppressed notification */ + struct uintr_upid_ctx { + struct list_head node; struct task_struct *task; /* Receiver task */ struct uintr_upid *upid; refcount_t refs; bool receiver_active; /* Flag for UPID being mapped to a receiver */ + bool waiting; }; struct uintr_receiver_info { @@ -43,11 +61,15 @@ void uintr_free(struct task_struct *task); void switch_uintr_prepare(struct task_struct *prev); void switch_uintr_return(void); +int uintr_receiver_wait(void); +void uintr_wake_up_process(void); + #else /* !CONFIG_X86_USER_INTERRUPTS */ static inline void uintr_free(struct task_struct *task) {} static inline void switch_uintr_prepare(struct task_struct *prev) {} static inline void switch_uintr_return(void) {} +static inline void uintr_wake_up_process(void) {} #endif /* CONFIG_X86_USER_INTERRUPTS */ diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index d8c45e0728f0..8d4fd7509523 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -149,6 +149,7 @@ static const __initconst struct idt_data apic_idts[] = { # endif #ifdef CONFIG_X86_USER_INTERRUPTS INTG(UINTR_NOTIFICATION_VECTOR, asm_sysvec_uintr_spurious_interrupt), + INTG(UINTR_KERNEL_VECTOR, asm_sysvec_uintr_kernel_notification), #endif # ifdef CONFIG_IRQ_WORK INTG(IRQ_WORK_VECTOR, asm_sysvec_irq_work), diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index e3c35668c7c5..22349f5c301b 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -22,6 +22,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -187,6 +188,11 @@ int arch_show_interrupts(struct seq_file *p, int prec) for_each_online_cpu(j) seq_printf(p, "%10u ", irq_stats(j)->uintr_spurious_count); seq_puts(p, " User-interrupt spurious event\n"); + + seq_printf(p, "%*s: ", prec, "UKN"); + for_each_online_cpu(j) + seq_printf(p, "%10u ", irq_stats(j)->uintr_kernel_notifications); + seq_puts(p, " User-interrupt kernel notification event\n"); #endif return 0; } @@ -356,6 +362,18 @@ DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_uintr_spurious_interrupt) ack_APIC_irq(); inc_irq_stat(uintr_spurious_count); } + +/* + * Handler for UINTR_KERNEL_VECTOR. + */ +DEFINE_IDTENTRY_SYSVEC(sysvec_uintr_kernel_notification) +{ + /* TODO: Add entry-exit tracepoints */ + ack_APIC_irq(); + inc_irq_stat(uintr_kernel_notifications); + + uintr_wake_up_process(); +} #endif diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c index 8f331c5fe0cf..4e5545e6d903 100644 --- a/arch/x86/kernel/uintr_core.c +++ b/arch/x86/kernel/uintr_core.c @@ -28,22 +28,6 @@ #define UINTR_MAX_UITT_NR 256 #define UINTR_MAX_UVEC_NR 64 -/* User Posted Interrupt Descriptor (UPID) */ -struct uintr_upid { - struct { - u8 status; /* bit 0: ON, bit 1: SN, bit 2-7: reserved */ - u8 reserved1; /* Reserved */ - u8 nv; /* Notification vector */ - u8 reserved2; /* Reserved */ - u32 ndst; /* Notification destination */ - } nc __packed; /* Notification control */ - u64 puir; /* Posted user interrupt requests */ -} __aligned(64); - -/* UPID Notification control status */ -#define UPID_ON 0x0 /* Outstanding notification */ -#define UPID_SN 0x1 /* Suppressed notification */ - struct uintr_receiver { struct uintr_upid_ctx *upid_ctx; u64 uvec_mask; /* track active vector per bit */ @@ -70,6 +54,10 @@ struct uintr_sender { u64 uitt_mask[BITS_TO_U64(UINTR_MAX_UITT_NR)]; }; +/* TODO: To remove the global lock, move to a per-cpu wait list. */ +static DEFINE_SPINLOCK(uintr_wait_lock); +static struct list_head uintr_wait_list = LIST_HEAD_INIT(uintr_wait_list); + inline bool uintr_arch_enabled(void) { return static_cpu_has(X86_FEATURE_UINTR); @@ -80,6 +68,12 @@ static inline bool is_uintr_receiver(struct task_struct *t) return !!t->thread.ui_recv; } +/* Always make sure task is_uintr_receiver() before calling */ +static inline bool is_uintr_waiting(struct task_struct *t) +{ + return t->thread.ui_recv->upid_ctx->waiting; +} + static inline bool is_uintr_sender(struct task_struct *t) { return !!t->thread.ui_send; @@ -151,6 +145,7 @@ static struct uintr_upid_ctx *alloc_upid(void) refcount_set(&upid_ctx->refs, 1); upid_ctx->task = get_task_struct(current); upid_ctx->receiver_active = true; + upid_ctx->waiting = false; return upid_ctx; } @@ -494,6 +489,68 @@ int do_uintr_register_sender(struct uintr_receiver_info *r_info, return 0; } +int uintr_receiver_wait(void) +{ + struct uintr_upid_ctx *upid_ctx; + unsigned long flags; + + if (!is_uintr_receiver(current)) + return -EOPNOTSUPP; + + upid_ctx = current->thread.ui_recv->upid_ctx; + upid_ctx->upid->nc.nv = UINTR_KERNEL_VECTOR; + upid_ctx->waiting = true; + spin_lock_irqsave(&uintr_wait_lock, flags); + list_add(&upid_ctx->node, &uintr_wait_list); + spin_unlock_irqrestore(&uintr_wait_lock, flags); + + set_current_state(TASK_INTERRUPTIBLE); + schedule(); + + return -EINTR; +} + +/* + * Runs in interrupt context. + * Scan through all UPIDs to check if any interrupt is on going. + */ +void uintr_wake_up_process(void) +{ + struct uintr_upid_ctx *upid_ctx, *tmp; + unsigned long flags; + + spin_lock_irqsave(&uintr_wait_lock, flags); + list_for_each_entry_safe(upid_ctx, tmp, &uintr_wait_list, node) { + if (test_bit(UPID_ON, (unsigned long *)&upid_ctx->upid->nc.status)) { + set_bit(UPID_SN, (unsigned long *)&upid_ctx->upid->nc.status); + upid_ctx->upid->nc.nv = UINTR_NOTIFICATION_VECTOR; + upid_ctx->waiting = false; + wake_up_process(upid_ctx->task); + list_del(&upid_ctx->node); + } + } + spin_unlock_irqrestore(&uintr_wait_lock, flags); +} + +/* Called when task is unregistering/exiting */ +static void uintr_remove_task_wait(struct task_struct *task) +{ + struct uintr_upid_ctx *upid_ctx, *tmp; + unsigned long flags; + + spin_lock_irqsave(&uintr_wait_lock, flags); + list_for_each_entry_safe(upid_ctx, tmp, &uintr_wait_list, node) { + if (upid_ctx->task == task) { + pr_debug("wait: Removing task %d from wait\n", + upid_ctx->task->pid); + upid_ctx->upid->nc.nv = UINTR_NOTIFICATION_VECTOR; + upid_ctx->waiting = false; + list_del(&upid_ctx->node); + } + } + spin_unlock_irqrestore(&uintr_wait_lock, flags); +} + int do_uintr_unregister_handler(void) { struct task_struct *t = current; @@ -548,7 +605,7 @@ int do_uintr_unregister_handler(void) * based on this UPID. */ set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); - + uintr_remove_task_wait(t); put_upid_ref(ui_recv->upid_ctx); kfree(ui_recv); t->thread.ui_recv = NULL; @@ -684,7 +741,7 @@ void switch_uintr_prepare(struct task_struct *prev) { struct uintr_upid *upid; - if (is_uintr_receiver(prev)) { + if (is_uintr_receiver(prev) && !is_uintr_waiting(prev)) { upid = prev->thread.ui_recv->upid_ctx->upid; set_bit(UPID_SN, (unsigned long *)&upid->nc.status); } @@ -806,6 +863,7 @@ void uintr_free(struct task_struct *t) * generated based on this UPID. */ set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); + uintr_remove_task_wait(t); ui_recv->upid_ctx->receiver_active = false; put_upid_ref(ui_recv->upid_ctx); kfree(ui_recv); diff --git a/arch/x86/kernel/uintr_fd.c b/arch/x86/kernel/uintr_fd.c index 3c82c032c0b9..a7e55d98c0c7 100644 --- a/arch/x86/kernel/uintr_fd.c +++ b/arch/x86/kernel/uintr_fd.c @@ -283,3 +283,18 @@ SYSCALL_DEFINE2(uintr_unregister_sender, int, uintrfd, unsigned int, flags) fdput(f); return ret; } + +/* + * sys_uintr_wait - Wait for a user interrupt + */ +SYSCALL_DEFINE1(uintr_wait, unsigned int, flags) +{ + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + /* TODO: Add a timeout option */ + return uintr_receiver_wait(); +} From patchwork Mon Sep 13 20:01:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85313C433FE for ; Mon, 13 Sep 2021 20:04:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 757336108B for ; Mon, 13 Sep 2021 20:04:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347904AbhIMUFz (ORCPT ); Mon, 13 Sep 2021 16:05:55 -0400 Received: from mga03.intel.com ([134.134.136.65]:19216 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347824AbhIMUFv (ORCPT ); Mon, 13 Sep 2021 16:05:51 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="221830804" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="221830804" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643940" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:33 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 12/13] x86/uintr: Wire up the user interrupt syscalls Date: Mon, 13 Sep 2021 13:01:31 -0700 Message-Id: <20210913200132.3396598-13-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Wire up the user interrupt receiver and sender related syscalls for x86_64. For rest of the architectures the syscalls are not implemented. Signed-off-by: Sohil Mehta --- arch/x86/entry/syscalls/syscall_32.tbl | 6 ++++++ arch/x86/entry/syscalls/syscall_64.tbl | 6 ++++++ include/linux/syscalls.h | 8 ++++++++ include/uapi/asm-generic/unistd.h | 15 ++++++++++++++- kernel/sys_ni.c | 8 ++++++++ scripts/checksyscalls.sh | 6 ++++++ 6 files changed, 48 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 960a021d543e..d0e97f1f1173 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -453,3 +453,9 @@ 446 i386 landlock_restrict_self sys_landlock_restrict_self 447 i386 memfd_secret sys_memfd_secret 448 i386 process_mrelease sys_process_mrelease +449 i386 uintr_register_handler sys_uintr_register_handler +450 i386 uintr_unregister_handler sys_uintr_unregister_handler +451 i386 uintr_create_fd sys_uintr_create_fd +452 i386 uintr_register_sender sys_uintr_register_sender +453 i386 uintr_unregister_sender sys_uintr_unregister_sender +454 i386 uintr_wait sys_uintr_wait diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 18b5500ea8bf..444af44e5947 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -370,6 +370,12 @@ 446 common landlock_restrict_self sys_landlock_restrict_self 447 common memfd_secret sys_memfd_secret 448 common process_mrelease sys_process_mrelease +449 common uintr_register_handler sys_uintr_register_handler +450 common uintr_unregister_handler sys_uintr_unregister_handler +451 common uintr_create_fd sys_uintr_create_fd +452 common uintr_register_sender sys_uintr_register_sender +453 common uintr_unregister_sender sys_uintr_unregister_sender +454 common uintr_wait sys_uintr_wait # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 252243c7783d..f47f64c36d87 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1060,6 +1060,14 @@ asmlinkage long sys_memfd_secret(unsigned int flags); /* arch/x86/kernel/ioport.c */ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int on); +/* arch/x86/kernel/uintr_fd.c */ +asmlinkage long sys_uintr_register_handler(u64 __user *handler, unsigned int flags); +asmlinkage long sys_uintr_unregister_handler(unsigned int flags); +asmlinkage long sys_uintr_create_fd(u64 vector, unsigned int flags); +asmlinkage long sys_uintr_register_sender(int uintr_fd, unsigned int flags); +asmlinkage long sys_uintr_unregister_sender(int uintr_fd, unsigned int flags); +asmlinkage long sys_uintr_wait(unsigned int flags); + /* pciconfig: alpha, arm, arm64, ia64, sparc */ asmlinkage long sys_pciconfig_read(unsigned long bus, unsigned long dfn, unsigned long off, unsigned long len, diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1c5fb86d455a..b9a8b344270a 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -880,8 +880,21 @@ __SYSCALL(__NR_memfd_secret, sys_memfd_secret) #define __NR_process_mrelease 448 __SYSCALL(__NR_process_mrelease, sys_process_mrelease) +#define __NR_uintr_register_handler 449 +__SYSCALL(__NR_uintr_register_handler, sys_uintr_register_handler) +#define __NR_uintr_unregister_handler 450 +__SYSCALL(__NR_uintr_unregister_handler, sys_uintr_unregister_handler) +#define __NR_uintr_create_fd 451 +__SYSCALL(__NR_uintr_create_fd, sys_uintr_create_fd) +#define __NR_uintr_register_sender 452 +__SYSCALL(__NR_uintr_register_sender, sys_uintr_register_sender) +#define __NR_uintr_unregister_sender 453 +__SYSCALL(__NR_uintr_unregister_sender, sys_uintr_unregister_sender) +#define __NR_uintr_wait 454 +__SYSCALL(__NR_uintr_wait, sys_uintr_wait) + #undef __NR_syscalls -#define __NR_syscalls 449 +#define __NR_syscalls 455 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index f43d89d92860..5d8b92ac197b 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -357,6 +357,14 @@ COND_SYSCALL(pkey_free); /* memfd_secret */ COND_SYSCALL(memfd_secret); +/* user interrupts */ +COND_SYSCALL(uintr_register_handler); +COND_SYSCALL(uintr_unregister_handler); +COND_SYSCALL(uintr_create_fd); +COND_SYSCALL(uintr_register_sender); +COND_SYSCALL(uintr_unregister_sender); +COND_SYSCALL(uintr_wait); + /* * Architecture specific weak syscall entries. */ diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh index fd9777f63f14..0969580d829c 100755 --- a/scripts/checksyscalls.sh +++ b/scripts/checksyscalls.sh @@ -204,6 +204,12 @@ cat << EOF #define __IGNORE__sysctl #define __IGNORE_arch_prctl #define __IGNORE_nfsservctl +#define __IGNORE_uintr_register_handler +#define __IGNORE_uintr_unregister_handler +#define __IGNORE_uintr_create_fd +#define __IGNORE_uintr_register_sender +#define __IGNORE_uintr_unregister_sender +#define __IGNORE_uintr_wait /* ... including the "new" 32-bit uid syscalls */ #define __IGNORE_lchown32 From patchwork Mon Sep 13 20:01:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 12490727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97057C4321E for ; Mon, 13 Sep 2021 20:04:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 82813610FE for ; Mon, 13 Sep 2021 20:04:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348047AbhIMUGN (ORCPT ); Mon, 13 Sep 2021 16:06:13 -0400 Received: from mga03.intel.com ([134.134.136.65]:19216 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347866AbhIMUFx (ORCPT ); Mon, 13 Sep 2021 16:05:53 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="221830808" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="221830808" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643943" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:33 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 13/13] selftests/x86: Add basic tests for User IPI Date: Mon, 13 Sep 2021 13:01:32 -0700 Message-Id: <20210913200132.3396598-14-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Include 2 basic tests for receiving a User IPI: 1. Receiver is spinning in userspace. 2. Receiver is blocked in the kernel. The selftests need gcc with 'muintr' support to compile. GCC 11 (recently released) has support for this. Signed-off-by: Sohil Mehta --- tools/testing/selftests/x86/Makefile | 10 ++ tools/testing/selftests/x86/uintr.c | 147 +++++++++++++++++++++++++++ 2 files changed, 157 insertions(+) create mode 100644 tools/testing/selftests/x86/uintr.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index b4142cd1c5c2..38588221b09e 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -9,6 +9,7 @@ UNAME_M := $(shell uname -m) CAN_BUILD_I386 := $(shell ./check_cc.sh $(CC) trivial_32bit_program.c -m32) CAN_BUILD_X86_64 := $(shell ./check_cc.sh $(CC) trivial_64bit_program.c) CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) +CAN_BUILD_UINTR := $(shell ./check_cc.sh $(CC) trivial_64bit_program.c -muintr) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ @@ -19,6 +20,11 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ vdso_restorer TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \ corrupt_xstate_header + +ifeq ($(CAN_BUILD_UINTR),1) +TARGETS_C_64BIT_ONLY := $(TARGETS_C_64BIT_ONLY) uintr +endif + # Some selftests require 32bit support enabled also on 64bit systems TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall @@ -41,6 +47,10 @@ ifeq ($(CAN_BUILD_WITH_NOPIE),1) CFLAGS += -no-pie endif +ifeq ($(CAN_BUILD_UINTR),1) +CFLAGS += -muintr +endif + define gen-target-rule-32 $(1) $(1)_32: $(OUTPUT)/$(1)_32 .PHONY: $(1) $(1)_32 diff --git a/tools/testing/selftests/x86/uintr.c b/tools/testing/selftests/x86/uintr.c new file mode 100644 index 000000000000..61a53526f2fa --- /dev/null +++ b/tools/testing/selftests/x86/uintr.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2020, Intel Corporation. + * + * Sohil Mehta + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include + +#ifndef __x86_64__ +# error This test is 64-bit only +#endif + +#ifndef __NR_uintr_register_handler +#define __NR_uintr_register_handler 449 +#define __NR_uintr_unregister_handler 450 +#define __NR_uintr_create_fd 451 +#define __NR_uintr_register_sender 452 +#define __NR_uintr_unregister_sender 453 +#define __NR_uintr_wait 454 +#endif + +#define uintr_register_handler(handler, flags) syscall(__NR_uintr_register_handler, handler, flags) +#define uintr_unregister_handler(flags) syscall(__NR_uintr_unregister_handler, flags) +#define uintr_create_fd(vector, flags) syscall(__NR_uintr_create_fd, vector, flags) +#define uintr_register_sender(fd, flags) syscall(__NR_uintr_register_sender, fd, flags) +#define uintr_unregister_sender(fd, flags) syscall(__NR_uintr_unregister_sender, fd, flags) +#define uintr_wait(flags) syscall(__NR_uintr_wait, flags) + +unsigned long uintr_received; +unsigned int uintr_fd; + +void __attribute__((interrupt))__attribute__((target("general-regs-only", "inline-all-stringops"))) +uintr_handler(struct __uintr_frame *ui_frame, + unsigned long long vector) +{ + uintr_received = 1; +} + +void receiver_setup_interrupt(void) +{ + int vector = 0; + int ret; + + /* Register interrupt handler */ + if (uintr_register_handler(uintr_handler, 0)) { + printf("[FAIL]\tInterrupt handler register error\n"); + exit(EXIT_FAILURE); + } + + /* Create uintr_fd */ + ret = uintr_create_fd(vector, 0); + if (ret < 0) { + printf("[FAIL]\tInterrupt vector registration error\n"); + exit(EXIT_FAILURE); + } + + uintr_fd = ret; +} + +void *sender_thread(void *arg) +{ + long sleep_usec = (long)arg; + int uipi_index; + + uipi_index = uintr_register_sender(uintr_fd, 0); + if (uipi_index < 0) { + printf("[FAIL]\tSender register error\n"); + return NULL; + } + + /* Sleep before sending IPI to allow the receiver to block in the kernel */ + if (sleep_usec) + usleep(sleep_usec); + + printf("\tother thread: sending IPI\n"); + _senduipi(uipi_index); + + uintr_unregister_sender(uintr_fd, 0); + + return NULL; +} + +static inline void cpu_relax(void) +{ + asm volatile("rep; nop" ::: "memory"); +} + +void test_base_ipi(void) +{ + pthread_t pt; + + uintr_received = 0; + if (pthread_create(&pt, NULL, &sender_thread, NULL)) { + printf("[FAIL]\tError creating sender thread\n"); + return; + } + + printf("[RUN]\tSpin in userspace (waiting for interrupts)\n"); + // Keep spinning until interrupt received + while (!uintr_received) + cpu_relax(); + + printf("[OK]\tUser interrupt received\n"); +} + +void test_blocking_ipi(void) +{ + pthread_t pt; + long sleep_usec; + + uintr_received = 0; + sleep_usec = 1000; + if (pthread_create(&pt, NULL, &sender_thread, (void *)sleep_usec)) { + printf("[FAIL]\tError creating sender thread\n"); + return; + } + + printf("[RUN]\tBlock in the kernel (waiting for interrupts)\n"); + uintr_wait(0); + if (uintr_received) + printf("[OK]\tUser interrupt received\n"); + else + printf("[FAIL]\tUser interrupt not received\n"); +} + +int main(int argc, char *argv[]) +{ + receiver_setup_interrupt(); + + /* Enable interrupts */ + _stui(); + + test_base_ipi(); + + test_blocking_ipi(); + + close(uintr_fd); + uintr_unregister_handler(0); + + exit(EXIT_SUCCESS); +}