mbox series

[v4,0/7] cxl: Background cmds and device sanitation

Message ID 20230421092321.12741-1-dave@stgolabs.net
Headers show
Series cxl: Background cmds and device sanitation | expand

Message

Davidlohr Bueso April 21, 2023, 9:23 a.m. UTC
Hi,

First thanks for the reviewing and feedback. It took me a while to get back to this in a
form of a new series between vacation, being sick and ensuring I (hopefully) covered all
your requirements. So all of this is intended for v6.5.

The first two patches can directly open the door for Scan Media, etc. and could be picked
up regardless of the rest of the series - I wouldn't want for Sanitize to hold up other
functionality, specially considering the special treatment it gets. That said, the synchronous
mbox bg handling code paths (patch 2) are the least tested in the series (for obvious reasons).

On that note, this series combines the original async semantics from the RFC/v2 specifically
for Sanitation while leaving the rest of the background-capable operations in the sync approach
from v3:
	https://lore.kernel.org/linux-cxl/20230224194652.1990604-1-dave@stgolabs.net/

Specifically, it's worth noting:

    o Supporting async polling for sanitation must protect against out-of-sync driver and hw.
      See testing (1) below.

    o Treating Sanitation as such a special beast can make the code a bit invasive imo,
      which I'm not crazy about but couldn't find a decent alternative. For example I realize
      that this is really ad-hoc code in __cxl_pci_mbox_send_cmd().

    o Regardless of the state of the irq setup, the probing never fails, and falls back to
      async polling as last resource.
      
    o Nothing depends explcitly on CPU cacheline management
      
    o All sysfs files/attributes in the security directory are visible.
    
    o I continue to use __ATTR() macros for sysfs attributes instead of the requested
      DEVICE_ATTR_*() ones because of the security directory (perhaps I'm missing something
      obvious).
    
    o I've dropped the 'security/state' sysfs file creation - I will use the a cached pmem
      security flags, but can be sent later once the rest is settled. The actual sanitize and
      erase commands do ask the hw about security - there is no risk of spamming the mailbox.

Testing.
========

o There are the mock device tests for Sanitize and Secure Erase.

o The latest (v2) qemu bg/sanitize support series is posted here:
	https://lore.kernel.org/linux-cxl/20230418172337.19207-1-dave@stgolabs.net/

(1) Window where driver is out of sync with hw (Sanitation async polling).

[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  159.297482] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:37:00.0: Sending command: 0x4400
[  159.298648] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:37:00.0: Doorbell wait took 0ms
[  159.299908] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:37:00.0: Sanitation operation started
>>>> qemu informs sanitation is done <<<<<
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  165.897345] cxl_pci 0000:37:00.0: Failed to sanitize device : -16
[  171.692050] cxl_pci:cxl_mbox_sanitize_work:147: cxl_pci 0000:37:00.0: Sanitation operation ended
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  173.373337] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:37:00.0: Sending command: 0x4400
[  173.374498] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:37:00.0: Doorbell wait took 0ms
[  173.375727] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:37:00.0: Sanitation operation started

(2) Perform sanitation of more than one memdev at a time (Sanitation async polling).

[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize
[  351.287129] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400
[  351.288403] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms
[  351.289706] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:36:00.0: Sanitation operation started
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  353.058614] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:37:00.0: Sending command: 0x4400
[  353.059854] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:37:00.0: Doorbell wait took 0ms
[  353.061126] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:37:00.0: Sanitation operation started
>>>>  qemu informs sanitation is done <<<<<
>>>>  qemu informs sanitation is done <<<<<
[  363.692138] cxl_pci:cxl_mbox_sanitize_work:147: cxl_pci 0000:36:00.0: Sanitation operation ended
[  365.227416] cxl_pci:cxl_mbox_sanitize_work:147: cxl_pci 0000:37:00.0: Sanitation operation ended

(3) Perform sanitation of more than one memdev at a time (Sanitation async irq).

[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize
[  193.729821] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:c1:00.0: Sending command: 0x4400
[  193.731071] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:c1:00.0: Doorbell wait took 0ms
[  193.732360] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:c1:00.0: Sanitation operation started
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  197.001466] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400
[  197.002694] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms
[  197.003956] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:36:00.0: Sanitation operation started
>>>> qemu says sanitation is done <<<<
[  197.731473] cxl_pci:cxl_pci_mbox_irq:119: cxl_pci 0000:c1:00.0: Sanitation operation ended
>>>> qemu says sanitation is done <<<<
[  201.003258] cxl_pci:cxl_pci_mbox_irq:119: cxl_pci 0000:36:00.0: Sanitation operation ended

(4) Forbid new sanitation while one is in progress (Sanitation asyn irq).
 
[root@fedora ~]# cat /sys/bus/cxl/devices/mem0/security/sanitize
disabled
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[   39.284258] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400
[   39.285459] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms
[   39.286723] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:36:00.0: Sanitation operation started
[root@fedora ~]# cat /sys/bus/cxl/devices/mem0/security/sanitize
sanitize
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[   42.697129] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400
[   42.698323] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms
[   42.699525] cxl_pci:__cxl_pci_mbox_send_cmd:335: cxl_pci 0000:36:00.0: Mailbox operation had an error: ongoing background operation
[   42.701119] cxl_pci 0000:36:00.0: Failed to sanitize device : -6
>>>> qemu says sanitation is done <<<<
[   43.285334] cxl_pci:cxl_pci_mbox_irq:119: cxl_pci 0000:36:00.0: Sanitation operation ended

Applies against 'fixes' branch from cxl.git. Please consider for v6.5. 

Thanks!

Davidlohr Bueso (7):
  cxl/pci: Allocate irq vectors earlier in pci probe
  cxl/mbox: Add background cmd handling machinery
  cxl/mbox: Add sanitation handling machinery
  cxl/mem: Wire up Sanitation support
  cxl/test: Add Sanitize opcode support
  cxl/mem: Support Secure Erase
  cxl/test: Add Secure Erase opcode support

 Documentation/ABI/testing/sysfs-bus-cxl |  29 ++++
 drivers/cxl/core/mbox.c                 |  63 +++++++-
 drivers/cxl/core/memdev.c               | 120 +++++++++++++++
 drivers/cxl/cxl.h                       |   7 +
 drivers/cxl/cxlmem.h                    |  26 ++++
 drivers/cxl/pci.c                       | 192 +++++++++++++++++++++++-
 tools/testing/cxl/test/mem.c            |  52 +++++++
 7 files changed, 483 insertions(+), 6 deletions(-)

--
2.40.0

Comments

Davidlohr Bueso April 23, 2023, 2:05 a.m. UTC | #1
On Fri, 21 Apr 2023, Davidlohr Bueso wrote:

>Testing.
>========
>
>o There are the mock device tests for Sanitize and Secure Erase.
>
>o The latest (v2) qemu bg/sanitize support series is posted here:
>	https://lore.kernel.org/linux-cxl/20230418172337.19207-1-dave@stgolabs.net/

fyi here's the support for the cxl-tool cli
     https://github.com/davidlohr/ndctl/tree/cxl-memdev-sanitation-v1

... which is also posted on the list:
     https://lore.kernel.org/linux-cxl/20230423015920.11384-1-dave@stgolabs.net/

Thanks,
Davidlohr