mbox series

[git,pull] habanalabs for drm-next-6.5

Message ID 20230608103043.GA2699019@ogabbay-vm-u20.habana-labs.com (mailing list archive)
State New, archived
Headers show
Series [git,pull] habanalabs for drm-next-6.5 | expand

Pull-request

https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git tags/drm-habanalabs-next-2023-06-08

Message

Oded Gabbay June 8, 2023, 10:30 a.m. UTC
Hi Dave, Daniel.

Habanalabs pull request for 6.5.

As Gaudi2 is pretty much stable, this contains mostly bug fixes and small
optimizations and improvements.

Full details are in the signed tag.

Thanks,
Oded

The following changes since commit 2e1492835e439fceba57a5b0f9b17da8e78ffa3d:

  Merge tag 'drm-misc-next-2023-06-01' of git://anongit.freedesktop.org/drm/drm-misc into drm-next (2023-06-02 13:39:00 +1000)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git tags/drm-habanalabs-next-2023-06-08

for you to fetch changes up to e6f49e96bc57d34fc0f617f37bfdf62a9b58d2c2:

  accel/habanalabs: refactor error info reset (2023-06-08 12:35:56 +0300)

----------------------------------------------------------------
This tag contains additional habanalabs driver changes for v6.5:

- uAPI changes:
  - Return 0 when user queries if there was a h/w or f/w error and no such error happened.
    Previously we returned an error in such case.

- New features and improvements:
  - Add pci health check when we lose connection with the firmware. This can be used to
    distinguish between pci link down and firmware getting stuck.
  - Add more info to the error print when TPC interrupt occur.
  - Reduce amount of code under mutex in the command submission of signal event.

- Firmware related fixes:
  - Fixes to the handshake protocol during f/w initialization.
  - Display information that the f/w sends us when encountering a DMA error.
  - Do soft-reset using a message sent to firmware instead of writing to MMIO.
  - Prepare generic code to extract f/w version numbers.

- Bug fixes and code cleanups. Notable fixes are:
  - Unsecure certain TPC registers that the user should access.
  - Fix handling of QMAN errors
  - Fix memory leak when recording errors (to later pass them to the user)
  - Multiple fixes to razwi interrupt handling code

----------------------------------------------------------------
Dafna Hirschfeld (6):
      accel/habanalabs: add helper to extract the FW major/minor
      accel/habanalabs: rename fw_{major/minor}_version to fw_inner_{major/minor}_ver
      accel/habanalabs: extract and save the FW's SW major/minor/sub-minor
      accel/habanalabs: check fw version using sw version
      accel/habanalabs: do soft-reset using cpucp packet
      accel/habanalabs: add missing tpc interrupt info

Dan Carpenter (1):
      accel/habanalabs: fix gaudi2_get_tpc_idle_status() return

Dani Liberman (4):
      accel/habanalabs: use binning info when handling razwi
      accel/habanalabs: mask part of hmmu page fault captured address
      accel/habanalabs: add description to several info ioctls
      accel/habanalabs: refactor error info reset

Koby Elbaz (8):
      accel/habanalabs: remove commented code that won't be used
      accel/habanalabs: minimize encapsulation signal mutex lock time
      accel/habanalabs: refactor abort of completions and waits
      accel/habanalabs: poll for device status update following WFE cmd
      accel/habanalabs: fix a static warning - 'dubious: x & !y'
      accel/habanalabs: rename security functions related arguments
      accel/habanalabs: upon DMA errors, use FW-extracted error cause
      accel/habanalabs: update state when loading boot fit

Moti Haimovski (3):
      accel/habanalabs: fix bug in free scratchpad memory
      accel/habanalabs: call to HW/FW err returns 0 when no events exist
      accel/habanalabs: fix mem leak in capture user mappings

Oded Gabbay (5):
      accel/habanalabs: set unused bit as reserved
      accel/habanalabs: align to latest firmware specs
      accel/habanalabs: print max timeout value on CS stuck
      accel/habanalabs: remove sim code
      accel/habanalabs: move ioctl error print to debug level

Ofir Bitton (7):
      accel/habanalabs: unsecure TPC bias registers
      accel/habanalabs: add pci health check during heartbeat
      accel/habanalabs: always fetch pci addr_dec error info
      accel/habanalabs: remove support for mmu disable
      accel/habanalabs: fix bug of not fetching addr_dec info
      accel/habanalabs: unsecure TSB_CFG_MTRR regs
      accel/habanalabs: add event queue extra validation

Rakesh Ughreja (1):
      accel/habanalabs: allow user to modify EDMA RL register

Tal Cohen (1):
      accel/habanalabs: ignore false positive razwi

Tom Rix (1):
      accel/habanalabs: remove variable gaudi_irq_name

Tomer Tayar (3):
      accel/habanalabs: expose debugfs files later
      accel/habanalabs: use lower QM in QM errors handling
      accel/habanalabs: print qman data on error only for lower qman

Yang Li (1):
      accel/habanalabs: Fix some kernel-doc comments

 drivers/accel/habanalabs/common/command_buffer.c   |   6 -
 .../accel/habanalabs/common/command_submission.c   |  61 ++--
 drivers/accel/habanalabs/common/debugfs.c          |  60 ++--
 drivers/accel/habanalabs/common/device.c           | 112 ++++---
 drivers/accel/habanalabs/common/firmware_if.c      | 212 ++++++++++---
 drivers/accel/habanalabs/common/habanalabs.h       |  77 ++---
 drivers/accel/habanalabs/common/habanalabs_drv.c   |   9 +-
 drivers/accel/habanalabs/common/habanalabs_ioctl.c |  35 +--
 drivers/accel/habanalabs/common/irq.c              |   2 +-
 drivers/accel/habanalabs/common/memory.c           | 104 +------
 drivers/accel/habanalabs/common/mmu/mmu.c          |  56 +---
 drivers/accel/habanalabs/common/security.c         |  57 ++--
 drivers/accel/habanalabs/gaudi/gaudi.c             |  13 +-
 drivers/accel/habanalabs/gaudi2/gaudi2.c           | 334 ++++++++-------------
 drivers/accel/habanalabs/gaudi2/gaudi2P.h          |   2 +-
 drivers/accel/habanalabs/gaudi2/gaudi2_security.c  |  15 +-
 drivers/accel/habanalabs/goya/goya.c               |   3 -
 drivers/accel/habanalabs/goya/goya_coresight.c     |   9 +-
 drivers/accel/habanalabs/include/common/cpucp_if.h |  22 +-
 .../accel/habanalabs/include/common/hl_boot_if.h   |  41 +--
 .../include/gaudi2/asic_reg/gaudi2_regs.h          |  11 +
 .../accel/habanalabs/include/gaudi2/gaudi2_fw_if.h |   2 +-
 include/uapi/drm/habanalabs_accel.h                |  10 +
 23 files changed, 557 insertions(+), 696 deletions(-)