From patchwork Mon Dec 7 20:39:22 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Marciniszyn, Mike" X-Patchwork-Id: 7789641 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 9027C9F39B for ; Mon, 7 Dec 2015 20:39:29 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 95E0120569 for ; Mon, 7 Dec 2015 20:39:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8603720555 for ; Mon, 7 Dec 2015 20:39:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932892AbbLGUjZ (ORCPT ); Mon, 7 Dec 2015 15:39:25 -0500 Received: from mga14.intel.com ([192.55.52.115]:21722 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932716AbbLGUjZ (ORCPT ); Mon, 7 Dec 2015 15:39:25 -0500 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP; 07 Dec 2015 12:39:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,396,1444719600"; d="scan'208";a="855885591" Received: from sedona.ch.intel.com ([143.182.228.65]) by fmsmga001.fm.intel.com with ESMTP; 07 Dec 2015 12:39:24 -0800 Received: from phlsvsles11.ph.intel.com (phlsvsles11.ph.intel.com [10.228.195.43]) by sedona.ch.intel.com (8.13.6/8.14.3/Standard MailSET/Hub) with ESMTP id tB7KdNUl026982; Mon, 7 Dec 2015 13:39:24 -0700 Received: from phlsvslse11.ph.intel.com (localhost [127.0.0.1]) by phlsvsles11.ph.intel.com with ESMTP id tB7KdMOD008031; Mon, 7 Dec 2015 15:39:23 -0500 Subject: [PATCH] staging/rdma/hfi1: convert buffers allocated atomic to per cpu To: devel@driverdev.osuosl.org From: Mike Marciniszyn Cc: linux-rdma@vger.kernel.org, dledford@redhat.com, linux-next@vger.kernel.org Date: Mon, 07 Dec 2015 15:39:22 -0500 Message-ID: <20151207203922.8009.36069.stgit@phlsvslse11.ph.intel.com> User-Agent: StGit/0.16 MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Profiling has shown the the atomic is a performance issue for the pio hot path. If multiple cpus allocated an sc's buffer, the cacheline containing the atomic will bounce from L0 to L0. Convert the atomic to a percpu variable. Reviewed-by: Jubin John Signed-off-by: Mike Marciniszyn --- drivers/staging/rdma/hfi1/pio.c | 40 ++++++++++++++++++++++++++++++---- drivers/staging/rdma/hfi1/pio.h | 2 +- drivers/staging/rdma/hfi1/pio_copy.c | 6 +++-- 3 files changed, 40 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/staging/rdma/hfi1/pio.c b/drivers/staging/rdma/hfi1/pio.c index eab58c1..ed8043b 100644 --- a/drivers/staging/rdma/hfi1/pio.c +++ b/drivers/staging/rdma/hfi1/pio.c @@ -660,6 +660,24 @@ void set_pio_integrity(struct send_context *sc) write_kctxt_csr(dd, hw_context, SC(CHECK_ENABLE), reg); } +static u32 get_buffers_allocated(struct send_context *sc) +{ + int cpu; + u32 ret = 0; + + for_each_possible_cpu(cpu) + ret += *per_cpu_ptr(sc->buffers_allocated, cpu); + return ret; +} + +static void reset_buffers_allocated(struct send_context *sc) +{ + int cpu; + + for_each_possible_cpu(cpu) + (*per_cpu_ptr(sc->buffers_allocated, cpu)) = 0; +} + /* * Allocate a NUMA relative send context structure of the given type along * with a HW context. @@ -668,7 +686,7 @@ struct send_context *sc_alloc(struct hfi1_devdata *dd, int type, uint hdrqentsize, int numa) { struct send_context_info *sci; - struct send_context *sc; + struct send_context *sc = NULL; dma_addr_t pa; unsigned long flags; u64 reg; @@ -686,10 +704,20 @@ struct send_context *sc_alloc(struct hfi1_devdata *dd, int type, if (!sc) return NULL; + sc->buffers_allocated = alloc_percpu(u32); + if (!sc->buffers_allocated) { + kfree(sc); + dd_dev_err(dd, + "Cannot allocate buffers_allocated per cpu counters\n" + ); + return NULL; + } + spin_lock_irqsave(&dd->sc_lock, flags); ret = sc_hw_alloc(dd, type, &sw_index, &hw_context); if (ret) { spin_unlock_irqrestore(&dd->sc_lock, flags); + free_percpu(sc->buffers_allocated); kfree(sc); return NULL; } @@ -705,7 +733,6 @@ struct send_context *sc_alloc(struct hfi1_devdata *dd, int type, spin_lock_init(&sc->credit_ctrl_lock); INIT_LIST_HEAD(&sc->piowait); INIT_WORK(&sc->halt_work, sc_halted); - atomic_set(&sc->buffers_allocated, 0); init_waitqueue_head(&sc->halt_wait); /* grouping is always single context for now */ @@ -866,6 +893,7 @@ void sc_free(struct send_context *sc) spin_unlock_irqrestore(&dd->sc_lock, flags); kfree(sc->sr); + free_percpu(sc->buffers_allocated); kfree(sc); } @@ -1029,7 +1057,7 @@ int sc_restart(struct send_context *sc) /* kernel context */ loop = 0; while (1) { - count = atomic_read(&sc->buffers_allocated); + count = get_buffers_allocated(sc); if (count == 0) break; if (loop > 100) { @@ -1197,7 +1225,8 @@ int sc_enable(struct send_context *sc) sc->sr_head = 0; sc->sr_tail = 0; sc->flags = 0; - atomic_set(&sc->buffers_allocated, 0); + /* the alloc lock insures no fast path allocation */ + reset_buffers_allocated(sc); /* * Clear all per-context errors. Some of these will be set when @@ -1373,7 +1402,8 @@ retry: /* there is enough room */ - atomic_inc(&sc->buffers_allocated); + preempt_disable(); + this_cpu_inc(*sc->buffers_allocated); /* read this once */ head = sc->sr_head; diff --git a/drivers/staging/rdma/hfi1/pio.h b/drivers/staging/rdma/hfi1/pio.h index 0bb885c..53d3e0a 100644 --- a/drivers/staging/rdma/hfi1/pio.h +++ b/drivers/staging/rdma/hfi1/pio.h @@ -130,7 +130,7 @@ struct send_context { spinlock_t credit_ctrl_lock ____cacheline_aligned_in_smp; u64 credit_ctrl; /* cache for credit control */ u32 credit_intr_count; /* count of credit intr users */ - atomic_t buffers_allocated; /* count of buffers allocated */ + u32 __percpu *buffers_allocated;/* count of buffers allocated */ wait_queue_head_t halt_wait; /* wait until kernel sees interrupt */ }; diff --git a/drivers/staging/rdma/hfi1/pio_copy.c b/drivers/staging/rdma/hfi1/pio_copy.c index 8972bbc..ebb0baf 100644 --- a/drivers/staging/rdma/hfi1/pio_copy.c +++ b/drivers/staging/rdma/hfi1/pio_copy.c @@ -160,7 +160,8 @@ void pio_copy(struct hfi1_devdata *dd, struct pio_buf *pbuf, u64 pbc, } /* finished with this buffer */ - atomic_dec(&pbuf->sc->buffers_allocated); + this_cpu_dec(*pbuf->sc->buffers_allocated); + preempt_enable(); } /* USE_SHIFTS is faster in user-space tests on a Xeon X5570 @ 2.93GHz */ @@ -854,5 +855,6 @@ void seg_pio_copy_end(struct pio_buf *pbuf) } /* finished with this buffer */ - atomic_dec(&pbuf->sc->buffers_allocated); + this_cpu_dec(*pbuf->sc->buffers_allocated); + preempt_enable(); }