From patchwork Sun Jun 11 09:32:05 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 9780091 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 754D2602C8 for ; Sun, 11 Jun 2017 09:32:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B52028138 for ; Sun, 11 Jun 2017 09:32:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3DAB628556; Sun, 11 Jun 2017 09:32:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F0C428138 for ; Sun, 11 Jun 2017 09:32:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751751AbdFKJc1 (ORCPT ); Sun, 11 Jun 2017 05:32:27 -0400 Received: from mail-qt0-f195.google.com ([209.85.216.195]:36000 "EHLO mail-qt0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751565AbdFKJc0 (ORCPT ); Sun, 11 Jun 2017 05:32:26 -0400 Received: by mail-qt0-f195.google.com with SMTP id s33so21260281qtg.3 for ; Sun, 11 Jun 2017 02:32:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dev-mellanox-co-il.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=VajCThgrERnj24OUGhPyl1W++1GNT9+2S0aoPSn4WCg=; b=CfK+1SDgsKOhgex/YibqrOPKGpxJkd/qpojlk9FdWd2HC3IEnBQlFD7V9yuEATxNyr 1UWCW1r9vTeXAuw4nFvw384ihooN0mrMSPKaOakUSZqnFNZs3Kn9DLryytG62x87JdgT bi2h9bpd9ZQ+9wtPCzXc3JoUZ83iV3eGAvlpT8j9Gk2q0kQyF+mNXB6YSroVRELzwHzl BLV2hpWv7vMEGGtqqC+ZiKu3by0iGqVdm0H9DQsM9X8SPkGcM2ee5iTlBSTkE9CQIrjy 2nDPeXsYQi+4J1emkmO4aVeypSAwnqbDxZlsw7nlSYKnCPTYAkO4OdI+m6+xzORW4CsE KSTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=VajCThgrERnj24OUGhPyl1W++1GNT9+2S0aoPSn4WCg=; b=e/g88KmzS33Ye7xnqczgLmT0H43iCb3lpGLNoXWO+FMdf8L8IgH6sURpq2TR/cD1ej Kk8KOxrpuWwJwlBuc4/ctO0CnplQbpiVW6WGPulxGrNviROVc5x95Ekj/0vaP8y7VjuI tiUE35FGDRsX68hHv4S+zFmEOTHagBvCWSZvaziUWJRF9yM1k/k87Muu++/N/FM8Fpnn 0QYRfYo+w/f3PijXT/HYEfExQ36eGw+5H68vdKlLY8EDrDlALe6PQQgcXKjHdEYjQJsy wJZ+xH2Y/QpXI3AiEARF6YOy453zVXGBJgiYFhmHFPVLnLusUJc/vG9CJoIDjPW5Zy0J Kapw== X-Gm-Message-State: AODbwcD3N/4aA80WKW3jByKoqEdc7zTnf8xNOH9jN5imCeaf/37p1x6F OhmZb1yigkM+CGiAI/QoX/J+nuzNt3SS X-Received: by 10.200.47.97 with SMTP id k30mr40868514qta.11.1497173545820; Sun, 11 Jun 2017 02:32:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.189.143 with HTTP; Sun, 11 Jun 2017 02:32:05 -0700 (PDT) In-Reply-To: <8057a5c0-24a4-6211-c318-1d037b645604@grimberg.me> References: <1496644560-28923-1-git-send-email-sagi@grimberg.me> <1496644560-28923-3-git-send-email-sagi@grimberg.me> <20170607083156.GB24876@lst.de> <9332ed02-9711-0196-14d9-e9641c4dc932@grimberg.me> <8057a5c0-24a4-6211-c318-1d037b645604@grimberg.me> From: Saeed Mahameed Date: Sun, 11 Jun 2017 12:32:05 +0300 Message-ID: Subject: Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code To: Sagi Grimberg , Majd Dibbiny , Eran Ben Elisha , amira@mellanox.com Cc: Tariq Toukan , Christoph Hellwig , Doug Ledford , Leon Romanovsky , linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Jun 8, 2017 at 3:29 PM, Sagi Grimberg wrote: > >>>>> My interpretation is that mlx5 tried to do this for the (rather >>>>> esoteric >>>>> in my mind) case where the platform does not have enough vectors for >>>>> the >>>>> driver to allocate percpu. In this case, the next best thing is to stay >>>>> as close to the device affinity as possible. >>>>> >>>> >>>> No, we did it for the reason that mlx5e netdevice assumes that >>>> IRQ[0]..IRQ[#num_numa/#cpu_per_numa] >>>> are always bound to the numa close to the device. and the mlx5e driver >>>> choose those IRQs to spread >>>> the RSS hash only into them and never uses other IRQs/Cores >>> >>> >>> >>> OK, that explains a lot of weirdness I've seen with mlx5e. >>> >>> Can you explain why you're using only a single numa node for your RSS >>> table? What does it buy you? You open RX rings for _all_ cpus but >>> only spread on part of them? I must be missing something here... >> >> >> Adding Tariq, >> >> this is also part of the weirdness :), we do that to make sure any OOB >> test you run you always get the best performance >> and we will guarantee to always use close numa cores. > > > Well I wish I knew that before :( I got to a point where I started > to seriously doubt the math truth of xor/toeplitz hashing strength :) > > I'm sure you ran plenty of performance tests, but from my experience, > application locality makes much more difference than device locality, > especially when the application needs to touch the data... > >> we open RX rings on all of the cores in case if the user want to >> change the RSS table to point to the whole thing on the fly "ethtool >> -X" > > > That is very counter intuitive afaict, is it documented anywhere? > > users might rely on the (absolutely reasonable) assumption that if a > NIC exposes X rx rings, rx hashing should spread across all of them and > not a subset. > This is why we want to remove this assumption from mlx5e >> But we are willing to change that, Tariq can provide the patch, >> without changing this mlx5e is broken. > > > What patch? to modify the RSS spread? What is exactly broken? current mlx5 netdev with your patches might spread traffic _ONLY_ to the far numa. To fix this in mlx5e we need something like this: and accepted along or before your seires. Also, we will have to run a performance & functional regression cycle on all the patches combined. > > So I'm not sure how to move forward here, should we modify the > indirection table construction to not rely on the unique affinity > mappings? --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 41cd22a223dc..15499865784f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3733,18 +3733,8 @@ void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev, u32 *indirection_rqt, int len, int num_channels) { - int node = mdev->priv.numa_node; - int node_num_of_cores; int i; - if (node == -1) - node = first_online_node; - - node_num_of_cores = cpumask_weight(cpumask_of_node(node)); - - if (node_num_of_cores) - num_channels = min_t(int, num_channels, node_num_of_cores); - we are working on such patch, and we would like to have it submitted