diff mbox series

[RFC] cxl/pci: Set default timeout for background operations

Message ID 20240207105349.301-1-sthanneeru.opensrc@micron.com
State New, archived
Headers show
Series [RFC] cxl/pci: Set default timeout for background operations | expand

Commit Message

Srinivasulu Opensrc Feb. 7, 2024, 10:53 a.m. UTC
From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>

The CXL 3.0 specification outlines background operations,
and support for handling these operations was added in following patch.

Link: https://lore.kernel.org/all/20230523170927.20685-5-dave@stgolabs.net/

Mailbox commands like ‘Log Populate’ use background operations
to complete the execution of the command.
This can lead to a timeout, since there is currently no option
in the ioctl cxl_send_command structure to specify
a timeout value. The default values being zero can lead
to the driver reporting false negatives to the application.

This patch aims to establish default values, enabling mailbox commands
that operate in the background to continue functioning even
if a timeout is not set in the userspace application.

Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
---
 drivers/cxl/pci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Dan Williams Feb. 15, 2024, 3:31 a.m. UTC | #1
sthanneeru.opensrc@ wrote:
> From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
> 
> The CXL 3.0 specification outlines background operations,
> and support for handling these operations was added in following patch.
> 
> Link: https://lore.kernel.org/all/20230523170927.20685-5-dave@stgolabs.net/
> 
> Mailbox commands like ‘Log Populate’ use background operations
> to complete the execution of the command.
> This can lead to a timeout, since there is currently no option
> in the ioctl cxl_send_command structure to specify
> a timeout value. The default values being zero can lead
> to the driver reporting false negatives to the application.
> 
> This patch aims to establish default values, enabling mailbox commands
> that operate in the background to continue functioning even
> if a timeout is not set in the userspace application.

The reason there are no defaults is because userspace is not allowed to
issue background commands. The CXL background command definition is
awkward in that it allows a single command to monopolize the mailbox for
an indefinite amount of time.

Instead, the approach taken with Firmware Update and Sanitize is that a
kernel sysfs ABI mediates access to the mailbox and facilitates bounded 
timeslices between command submissions. It effectively allows the kernel
to manage fairness and more importantly preempt userspace if it needs to
issue its own commands.

I assume you are only seeing this lack of a default due to building with
CONFIG_CXL_MEM_RAW_COMMANDS=y? If yes, "raw" means "raw" and the kernel
is mostly taken out of the loop of saving userspace from itself.

All that said, ugh, "Log Populate" has no facility to time bound the
population of the log. I do not think it is tenable for Linux to
surrender mailbox access for an indefinite uninterruptible amount of
time... unless you want to handle "Log Populate" like Sanitize where the
unbounded background operation is tolerated because the device is taken
offline?
Jonathan Cameron Feb. 15, 2024, 12:34 p.m. UTC | #2
On Wed, 14 Feb 2024 19:31:23 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> sthanneeru.opensrc@ wrote:
> > From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
> > 
> > The CXL 3.0 specification outlines background operations,
> > and support for handling these operations was added in following patch.
> > 
> > Link: https://lore.kernel.org/all/20230523170927.20685-5-dave@stgolabs.net/
> > 
> > Mailbox commands like ‘Log Populate’ use background operations
> > to complete the execution of the command.
> > This can lead to a timeout, since there is currently no option
> > in the ioctl cxl_send_command structure to specify
> > a timeout value. The default values being zero can lead
> > to the driver reporting false negatives to the application.
> > 
> > This patch aims to establish default values, enabling mailbox commands
> > that operate in the background to continue functioning even
> > if a timeout is not set in the userspace application.  
> 
> The reason there are no defaults is because userspace is not allowed to
> issue background commands. The CXL background command definition is
> awkward in that it allows a single command to monopolize the mailbox for
> an indefinite amount of time.
> 
> Instead, the approach taken with Firmware Update and Sanitize is that a
> kernel sysfs ABI mediates access to the mailbox and facilitates bounded 
> timeslices between command submissions. It effectively allows the kernel
> to manage fairness and more importantly preempt userspace if it needs to
> issue its own commands.
> 
> I assume you are only seeing this lack of a default due to building with
> CONFIG_CXL_MEM_RAW_COMMANDS=y? If yes, "raw" means "raw" and the kernel
> is mostly taken out of the loop of saving userspace from itself.
> 
> All that said, ugh, "Log Populate" has no facility to time bound the
> population of the log. I do not think it is tenable for Linux to
> surrender mailbox access for an indefinite uninterruptible amount of
> time... unless you want to handle "Log Populate" like Sanitize where the
> unbounded background operation is tolerated because the device is taken
> offline?

It may be pointless to do a component state dump only on an offline device.
My assumption is this one is hardware debug only.  Patches out of
tree or behind a really scary sounding config variable perhaps?
Other than vendor log I don't think populate log applies to the other
log types yet (they don't mention it anyway!)

Jonathan
diff mbox series

Patch

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 4fd1f207c84e..82bdff55bb8d 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -40,6 +40,10 @@ 
 /* CXL 2.0 - 8.2.8.4 */
 #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
 
+/* Default timeout for background operations */
+#define CXL_RC_BG_POLL_CNT		40
+#define CXL_RC_BG_POLL_INTERVAL_MS	1000
+
 /*
  * CXL 2.0 ECN "Add Mailbox Ready Time" defines a capability field to
  * dictate how long to wait for the mailbox to become ready. The new
@@ -312,6 +316,13 @@  static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
 
 		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
 			mbox_cmd->opcode);
+		/*
+		 * Set default background operation timeout, if not set already.
+		 */
+		if (!mbox_cmd->poll_count)
+			mbox_cmd->poll_count = CXL_RC_BG_POLL_CNT;
+		if (!mbox_cmd->poll_interval_ms)
+			mbox_cmd->poll_interval_ms = CXL_RC_BG_POLL_INTERVAL_MS;
 
 		timeout = mbox_cmd->poll_interval_ms;
 		for (i = 0; i < mbox_cmd->poll_count; i++) {