Message ID | 20240207105349.301-1-sthanneeru.opensrc@micron.com |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] cxl/pci: Set default timeout for background operations | expand |
sthanneeru.opensrc@ wrote: > From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com> > > The CXL 3.0 specification outlines background operations, > and support for handling these operations was added in following patch. > > Link: https://lore.kernel.org/all/20230523170927.20685-5-dave@stgolabs.net/ > > Mailbox commands like ‘Log Populate’ use background operations > to complete the execution of the command. > This can lead to a timeout, since there is currently no option > in the ioctl cxl_send_command structure to specify > a timeout value. The default values being zero can lead > to the driver reporting false negatives to the application. > > This patch aims to establish default values, enabling mailbox commands > that operate in the background to continue functioning even > if a timeout is not set in the userspace application. The reason there are no defaults is because userspace is not allowed to issue background commands. The CXL background command definition is awkward in that it allows a single command to monopolize the mailbox for an indefinite amount of time. Instead, the approach taken with Firmware Update and Sanitize is that a kernel sysfs ABI mediates access to the mailbox and facilitates bounded timeslices between command submissions. It effectively allows the kernel to manage fairness and more importantly preempt userspace if it needs to issue its own commands. I assume you are only seeing this lack of a default due to building with CONFIG_CXL_MEM_RAW_COMMANDS=y? If yes, "raw" means "raw" and the kernel is mostly taken out of the loop of saving userspace from itself. All that said, ugh, "Log Populate" has no facility to time bound the population of the log. I do not think it is tenable for Linux to surrender mailbox access for an indefinite uninterruptible amount of time... unless you want to handle "Log Populate" like Sanitize where the unbounded background operation is tolerated because the device is taken offline?
On Wed, 14 Feb 2024 19:31:23 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > sthanneeru.opensrc@ wrote: > > From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com> > > > > The CXL 3.0 specification outlines background operations, > > and support for handling these operations was added in following patch. > > > > Link: https://lore.kernel.org/all/20230523170927.20685-5-dave@stgolabs.net/ > > > > Mailbox commands like ‘Log Populate’ use background operations > > to complete the execution of the command. > > This can lead to a timeout, since there is currently no option > > in the ioctl cxl_send_command structure to specify > > a timeout value. The default values being zero can lead > > to the driver reporting false negatives to the application. > > > > This patch aims to establish default values, enabling mailbox commands > > that operate in the background to continue functioning even > > if a timeout is not set in the userspace application. > > The reason there are no defaults is because userspace is not allowed to > issue background commands. The CXL background command definition is > awkward in that it allows a single command to monopolize the mailbox for > an indefinite amount of time. > > Instead, the approach taken with Firmware Update and Sanitize is that a > kernel sysfs ABI mediates access to the mailbox and facilitates bounded > timeslices between command submissions. It effectively allows the kernel > to manage fairness and more importantly preempt userspace if it needs to > issue its own commands. > > I assume you are only seeing this lack of a default due to building with > CONFIG_CXL_MEM_RAW_COMMANDS=y? If yes, "raw" means "raw" and the kernel > is mostly taken out of the loop of saving userspace from itself. > > All that said, ugh, "Log Populate" has no facility to time bound the > population of the log. I do not think it is tenable for Linux to > surrender mailbox access for an indefinite uninterruptible amount of > time... unless you want to handle "Log Populate" like Sanitize where the > unbounded background operation is tolerated because the device is taken > offline? It may be pointless to do a component state dump only on an offline device. My assumption is this one is hardware debug only. Patches out of tree or behind a really scary sounding config variable perhaps? Other than vendor log I don't think populate log applies to the other log types yet (they don't mention it anyway!) Jonathan
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 4fd1f207c84e..82bdff55bb8d 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -40,6 +40,10 @@ /* CXL 2.0 - 8.2.8.4 */ #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ) +/* Default timeout for background operations */ +#define CXL_RC_BG_POLL_CNT 40 +#define CXL_RC_BG_POLL_INTERVAL_MS 1000 + /* * CXL 2.0 ECN "Add Mailbox Ready Time" defines a capability field to * dictate how long to wait for the mailbox to become ready. The new @@ -312,6 +316,13 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds, dev_dbg(dev, "Mailbox background operation (0x%04x) started\n", mbox_cmd->opcode); + /* + * Set default background operation timeout, if not set already. + */ + if (!mbox_cmd->poll_count) + mbox_cmd->poll_count = CXL_RC_BG_POLL_CNT; + if (!mbox_cmd->poll_interval_ms) + mbox_cmd->poll_interval_ms = CXL_RC_BG_POLL_INTERVAL_MS; timeout = mbox_cmd->poll_interval_ms; for (i = 0; i < mbox_cmd->poll_count; i++) {