From patchwork Wed Jul 18 11:38:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 10532213 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 11708600F4 for ; Wed, 18 Jul 2018 11:38:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7AD429020 for ; Wed, 18 Jul 2018 11:38:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DA48429115; Wed, 18 Jul 2018 11:38:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4981B29020 for ; Wed, 18 Jul 2018 11:38:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729185AbeGRMQA (ORCPT ); Wed, 18 Jul 2018 08:16:00 -0400 Received: from mail-wr1-f68.google.com ([209.85.221.68]:44615 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727515AbeGRMQA (ORCPT ); Wed, 18 Jul 2018 08:16:00 -0400 Received: by mail-wr1-f68.google.com with SMTP id r16-v6so4329592wrt.11; Wed, 18 Jul 2018 04:38:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=cP2P2xGSYkXF0VDldKKdFlgSL236ZQ6MdbIwMvaNnN4=; b=Yc46ziKTzq8GJWXv8XCWruGWArXKJXkZPdc0/lrRoVZnhJQFPQQbhzA0k3joEtpDej sHuJGoxSamHGTcuSmEX3euGxelFTba6CLl1kgUsnJD38LxP6ZeWAKupXZv06lW/pMUm7 S4eadcnP1PwshSzBSsmU+u4SK1Od0nvOuUeIxZd/Aa0EFGBhl4M+VqiiAtfstPRHPpP6 CvWI+VTTKIIGz5hz94xxPR5k+7/S1XAbPXl7/FqYuwqrhBne9vMmfCW5fzsk06kA3kL6 tLhuIyVcGKEUqZ/BSflh87ss1Zz9kaVWhn9k8wOnFiH75c2e7xoZhEsHaPeQ6mec/a2G VKew== X-Gm-Message-State: AOUpUlGeAl8Pg8YOnyKauYuqSrUJpmjghtHer0/LMRmJhaf2b6MYT5l0 HU6mD7cdKdwkRhkoCfv4hbTmT/cS X-Google-Smtp-Source: AAOMgpcSACejsWA7Gtv82M06kAnlK+hAm1fsQLfNZDRuDFR3WoFTYeNVns/ppOxQ2dhSLWnhkFOHnw== X-Received: by 2002:adf:e90c:: with SMTP id f12-v6mr4229046wrm.126.1531913908526; Wed, 18 Jul 2018 04:38:28 -0700 (PDT) Received: from [192.168.64.169] (bzq-219-42-90.isdn.bezeqint.net. [62.219.42.90]) by smtp.gmail.com with ESMTPSA id w9-v6sm9406650wrk.28.2018.07.18.04.38.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Jul 2018 04:38:27 -0700 (PDT) Subject: Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask To: Steve Wise , 'Max Gurtovoy' , 'Leon Romanovsky' Cc: 'Doug Ledford' , 'Jason Gunthorpe' , 'RDMA mailing list' , 'Saeed Mahameed' , 'linux-netdev' References: <20180716083012.15410-1-leon@kernel.org> <0cf29652-9034-6283-ef36-95de4588980f@grimberg.me> <20180716103046.GJ3152@mtr-leonro.mtl.com> <1cb63259-9fb6-59b0-3a34-0659973228ea@mellanox.com> <40d49fe1-c548-31ec-7daa-b19056215d69@mellanox.com> <243215dc-2b06-9c99-a0cb-8a45e0257077@opengridcomputing.com> <3f827784-3089-2375-9feb-b3c1701d7471@mellanox.com> <01cd01d41dce$992f4f30$cb8ded90$@opengridcomputing.com> From: Sagi Grimberg Message-ID: <0834cae6-33d6-3526-7d85-f5cae18c5487@grimberg.me> Date: Wed, 18 Jul 2018 14:38:25 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <01cd01d41dce$992f4f30$cb8ded90$@opengridcomputing.com> Content-Language: en-US Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP >> IMO we must fulfil the user wish to connect to N queues and not reduce >> it because of affinity overlaps. So in order to push Leon's patch we >> must also fix the blk_mq_rdma_map_queues to do a best effort mapping >> according the affinity and map the rest in naive way (in that way we >> will *always* map all the queues). > > That is what I would expect also. For example, in my node, where there are > 16 cpus, and 2 numa nodes, I observe much better nvmf IOPS performance by > setting up my 16 driver completion event queues such that each is bound to a > node-local cpu. So I end up with each nodel-local cpu having 2 queues bound > to it. W/O adding support in iw_cxgb4 for ib_get_vector_affinity(), this > works fine. I assumed adding ib_get_vector_affinity() would allow this to > all "just work" by default, but I'm running into this connection failure > issue. > > I don't understand exactly what the blk_mq layer is trying to do, but I > assume it has ingress event queues and processing that it trying to align > with the drivers ingress cq event handling, so everybody stays on the same > cpu (or at least node). But something else is going on. Is there > documentation on how this works somewhere? Does this (untested) patch help? --- void blk_mq_quiesce_queue_nowait(struct request_queue *q); -- It really is still a best effort thing... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 3eb169f15842..dbe962cb537d 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -30,29 +30,34 @@ static int get_first_sibling(unsigned int cpu) return cpu; } -int blk_mq_map_queues(struct blk_mq_tag_set *set) +void blk_mq_map_queue(struct blk_mq_tag_set *set, unsigned int cpu) { unsigned int *map = set->mq_map; unsigned int nr_queues = set->nr_hw_queues; - unsigned int cpu, first_sibling; + unsigned int first_sibling; - for_each_possible_cpu(cpu) { - /* - * First do sequential mapping between CPUs and queues. - * In case we still have CPUs to map, and we have some number of - * threads per cores then map sibling threads to the same queue for - * performace optimizations. - */ - if (cpu < nr_queues) { + /* + * First do sequential mapping between CPUs and queues. + * In case we still have CPUs to map, and we have some number of + * threads per cores then map sibling threads to the same queue for + * performace optimizations. + */ + if (cpu < nr_queues) { + map[cpu] = cpu_to_queue_index(nr_queues, cpu); + } else { + first_sibling = get_first_sibling(cpu); + if (first_sibling == cpu) map[cpu] = cpu_to_queue_index(nr_queues, cpu); - } else { - first_sibling = get_first_sibling(cpu); - if (first_sibling == cpu) - map[cpu] = cpu_to_queue_index(nr_queues, cpu); - else - map[cpu] = map[first_sibling]; - } + else + map[cpu] = map[first_sibling]; } +} +EXPORT_SYMBOL_GPL(blk_mq_map_queue); + +int blk_mq_map_queues(struct blk_mq_tag_set *set) +{ + for_each_possible_cpu(cpu) + blk_mq_map_queue(set, cpu); return 0; } diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c index 996167f1de18..5e91789bea5b 100644 --- a/block/blk-mq-rdma.c +++ b/block/blk-mq-rdma.c @@ -35,6 +35,10 @@ int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set, const struct cpumask *mask; unsigned int queue, cpu; + /* reset all to */ + for_each_possible_cpu(cpu) + set->mq_map[cpu] = UINT_MAX; + for (queue = 0; queue < set->nr_hw_queues; queue++) { mask = ib_get_vector_affinity(dev, first_vec + queue); if (!mask) @@ -44,6 +48,11 @@ int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set, set->mq_map[cpu] = queue; } + for_each_possible_cpu(cpu) { + if (set->mq_map[cpu] == UINT_MAX) + blk_mq_map_queue(set, cpu); + } + return 0; fallback: diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index e3147eb74222..7a9848a82475 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -283,6 +283,7 @@ int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout); int blk_mq_map_queues(struct blk_mq_tag_set *set); +void blk_mq_map_queue(struct blk_mq_tag_set *set, unsigned int cpu); void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues);