From patchwork Thu Nov 14 12:18:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Swiatkowski X-Patchwork-Id: 13875068 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 692071DE2D5 for ; Thu, 14 Nov 2024 12:20:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731586817; cv=none; b=PcyVIHjDO/UA/QoZrCxnMecAbd+SsXuPx4xwSUurs8tugPPlWvTixVuxdjPC4GYVvggdot2RrP6rdeEThW6AtuXTXAmGzYsq0TRICNp0Bwr+ZfQjy1nyA6yaaQ044yoX1v4K9mhJ//IJgYc2SAVb6Ar0Hy0jy86DkfvjI7qQp84= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731586817; c=relaxed/simple; bh=+z/flLXDPWQFpt/9OGG0zjRnCS6dvPaAFL9ZsnCA74c=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Mup45TL8cazlEq9EFrsRyI7fY+v7Gb1ZjI16WSpwk6ymQJQiO7PWAN9CPESoNKLrCU5eaAdKVwRU59fUEja5XiuNYlAEJmXSoq2cQOnFK1X4usYV2syrDodTfUWDP0Hw19IJtPGT4WNre29V/u4Fzbkz7IKNRKoKXweE6n3dHEs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aq7MChNQ; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aq7MChNQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731586816; x=1763122816; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=+z/flLXDPWQFpt/9OGG0zjRnCS6dvPaAFL9ZsnCA74c=; b=aq7MChNQfM5mIaw7EbgztHWStQ6N+aoYC7A1fpb6qeebwSabaBKeD0WB BCZnFNdr1TV+pHPpKJPdjgFM2jE0OnnVXrlvaFHlFop4wnXNOHUmOl8JV XfL5ZbYBiSkoC3IRJNRU5JT6HejjWx9nQItc86PdZ3Z1L8k4FOXGNmatn TueljfHkxnWk9L/vH3/sbmZL8sqIcJAUhvGqkTH7rvowxrHpPFKsez1uX 3dvG89pqHRukGK6qLgYYsW/IJ7Zc+PgoY9mAsbquIq4aBPn2SgydoU7Ht afRltpeqE+3VfAHAcGAJC97Qr4zMsUz9u8RORlvOd6Fb3Aa3gC36uxEGK A==; X-CSE-ConnectionGUID: sraxuaXFS0ObjPtNUr0xpQ== X-CSE-MsgGUID: UPgFEm2iSLO1kqcogIy82g== X-IronPort-AV: E=McAfee;i="6700,10204,11255"; a="56916852" X-IronPort-AV: E=Sophos;i="6.12,154,1728975600"; d="scan'208";a="56916852" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2024 04:20:15 -0800 X-CSE-ConnectionGUID: Diw4m/j7TWCsU+He0rynmw== X-CSE-MsgGUID: Pv7LrwqhQLG5SD5fyRVbow== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="93135624" Received: from gk3153-dr2-r750-36946.igk.intel.com ([10.102.20.192]) by orviesa003.jf.intel.com with ESMTP; 14 Nov 2024 04:20:10 -0800 From: Michal Swiatkowski To: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org, pawel.chmielewski@intel.com, sridhar.samudrala@intel.com, jacob.e.keller@intel.com, pio.raczynski@gmail.com, konrad.knitter@intel.com, marcin.szycik@intel.com, wojciech.drewek@intel.com, nex.sw.ncis.nat.hpm.dev@intel.com, przemyslaw.kitszel@intel.com, jiri@resnulli.us, horms@kernel.org, David.Laight@ACULAB.COM, pmenzel@molgen.mpg.de, mschmidt@redhat.com, rafal.romanowski@intel.com Subject: [iwl-next v8 0/9] ice: managing MSI-X in driver Date: Thu, 14 Nov 2024 13:18:32 +0100 Message-ID: <20241114122009.97416-1-michal.swiatkowski@linux.intel.com> X-Mailer: git-send-email 2.42.0 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Hi, It is another try to allow user to manage amount of MSI-X used for each feature in ice. First was via devlink resources API, it wasn't accepted in upstream. Also static MSI-X allocation using devlink resources isn't really user friendly. This try is using more dynamic way. "Dynamic" across whole kernel when platform supports it and "dynamic" across the driver when not. To achieve that reuse global devlink parameter pf_msix_max and pf_msix_min. It fits how ice hardware counts MSI-X. In case of ice amount of MSI-X reported on PCI is a whole MSI-X for the card (with MSI-X for VFs also). Having pf_msix_max allow user to statically set how many MSI-X he wants on PF and how many should be reserved for VFs. pf_msix_min is used to set minimum number of MSI-X with which ice driver should probe correctly. Meaning of this field in case of dynamic vs static allocation: - on system with dynamic MSI-X allocation support * alloc pf_msix_min as static, rest will be allocated dynamically - on system without dynamic MSI-X allocation support * try alloc pf_msix_max as static, minimum acceptable result is pf_msix_min As Jesse and Piotr suggested pf_msix_max and pf_msix_min can (an probably should) be stored in NVM. This patchset isn't implementing that. Dynamic (kernel or driver) way means that splitting MSI-X across the RDMA and eth in case there is a MSI-X shortage isn't correct. Can work when dynamic is only on driver site, but can't when dynamic is on kernel site. Let's remove this code and move to MSI-X allocation feature by feature. If there is no more MSI-X for a feature, a feature is working with less MSI-X or it is turned off. There is a regression here. With MSI-X splitting user can run RDMA and eth even on system with not enough MSI-X. Now only eth will work. RDMA can be turned on by changing number of PF queues (lowering) and reprobe RDMA driver. Example: 72 CPU number, eth, RDMA and flow director (1 MSI-X), 1 MSI-X for OICR on PF, and 1 more for RDMA. Card is using 1 + 72 + 1 + 72 + 1 = 147. We set pf_msix_min = 2, pf_msix_max = 128 OICR: 1 eth: 72 flow director: 1 RDMA: 128 - 74 = 54 We can change number of queues on pf to 36 and do devlink reinit OICR: 1 eth: 36 RDMA: 73 flow director: 1 We can also (implemented in "ice: enable_rdma devlink param") turned RDMA off. OICR: 1 eth: 72 RDMA: 0 (turned off) flow director: 1 After this changes we have a static base vector for SRIOV (SIOV probably in the feature). Last patch from this series is simplifying managing VF MSI-X code based on static vector. Now changing queues using ethtool is also changing MSI-X. If there is enough MSI-X it is always one to one. When there is not enough there will be more queues than MSI-X. There is a lack of ability to set how many queues should be used per MSI-X. Maybe we should introduce another ethtool param for it? Sth like queues_per_vector? v7 --> v8: [7] * fix unrolling in devlink parameters register function (patch 2) v6 --> v7: [6] * use vu32 for devlink MSI-X parameters instead of u16 (patch 2) * < instead of <= for MSI-X min parameter validation (patch 2) * use u32 for MSI-X values (patch 2, 8) v5 --> v6: [5] * set default MSI-X max value based on needs instead of const define (patch 3) v4 --> v5: [4] * count combined queues in ethtool for case the vectors aren't mapped 1:1 to queues (patch 1) * change min_t to min where the casting isn't needed (and can hide problems) (patch 4) * load msix_max and msix_min value after devlink reload; it accidentally wasn't added after removing loading in probe path to mitigate error from devl_para_driverinit...() (patch 2) * add documentation in develink/ice for new parameters (patch 2) v3 --> v4: [3] * drop unnecessary text in devlink validation comments * assume that devl_param_driverinit...() shouldn't return error in normal execution path v2 --> v3: [2] * move flow director init before RDMA init * fix unrolling RDMA MSI-X allocation * add comment in commit message about lowering control RDMA MSI-X amount v1 --> v2: [1] * change permanent MSI-X cmode parameters to driverinit * remove locking during devlink parameter registration (it is now locked for whole init/deinit part) [7] https://lore.kernel.org/netdev/20241104121337.129287-1-michal.swiatkowski@linux.intel.com/ [6] https://lore.kernel.org/netdev/20241028100341.16631-1-michal.swiatkowski@linux.intel.com/ [5] https://lore.kernel.org/netdev/20241024121230.5861-1-michal.swiatkowski@linux.intel.com/T/#t [4] https://lore.kernel.org/netdev/20240930120402.3468-1-michal.swiatkowski@linux.intel.com/ [3] https://lore.kernel.org/netdev/20240808072016.10321-1-michal.swiatkowski@linux.intel.com/ [2] https://lore.kernel.org/netdev/20240801093115.8553-1-michal.swiatkowski@linux.intel.com/ [1] https://lore.kernel.org/netdev/20240213073509.77622-1-michal.swiatkowski@linux.intel.com/ Note: previous patchset is on dev-queue. I will be unavailable some time, so sending fixed next version basing it on Tony main. Michal Swiatkowski (8): ice: devlink PF MSI-X max and min parameter ice: remove splitting MSI-X between features ice: get rid of num_lan_msix field ice, irdma: move interrupts code to irdma ice: treat dyn_allowed only as suggestion ice: enable_rdma devlink param ice: simplify VF MSI-X managing ice: init flow director before RDMA Documentation/networking/devlink/ice.rst | 11 + drivers/infiniband/hw/irdma/hw.c | 2 - drivers/infiniband/hw/irdma/main.c | 46 ++- drivers/infiniband/hw/irdma/main.h | 3 + .../net/ethernet/intel/ice/devlink/devlink.c | 109 +++++++ drivers/net/ethernet/intel/ice/ice.h | 21 +- drivers/net/ethernet/intel/ice/ice_base.c | 10 +- drivers/net/ethernet/intel/ice/ice_ethtool.c | 6 +- drivers/net/ethernet/intel/ice/ice_idc.c | 64 +--- drivers/net/ethernet/intel/ice/ice_irq.c | 275 ++++++------------ drivers/net/ethernet/intel/ice/ice_irq.h | 13 +- drivers/net/ethernet/intel/ice/ice_lib.c | 35 ++- drivers/net/ethernet/intel/ice/ice_main.c | 6 +- drivers/net/ethernet/intel/ice/ice_sriov.c | 154 +--------- include/linux/net/intel/iidc.h | 2 + 15 files changed, 335 insertions(+), 422 deletions(-) base-commit: 31a1f8752f7df7e3d8122054fbef02a9a8bff38f