From patchwork Tue Apr 16 16:30:01 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roland Dreier X-Patchwork-Id: 2449981 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 093943FD8C for ; Tue, 16 Apr 2013 16:30:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964982Ab3DPQaK (ORCPT ); Tue, 16 Apr 2013 12:30:10 -0400 Received: from na3sys010aog106.obsmtp.com ([74.125.245.80]:55224 "HELO na3sys010aog106.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S964981Ab3DPQaJ (ORCPT ); Tue, 16 Apr 2013 12:30:09 -0400 Received: from mail-pa0-f70.google.com ([209.85.220.70]) (using TLSv1) by na3sys010aob106.postini.com ([74.125.244.12]) with SMTP ID DSNKUW18kF50GFIG+rVc2b5kJT+PpeZhDVVi@postini.com; Tue, 16 Apr 2013 09:30:08 PDT Received: by mail-pa0-f70.google.com with SMTP id bh4so610201pad.9 for ; Tue, 16 Apr 2013 09:30:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google; h=x-received:x-received:sender:from:to:cc:subject:date:message-id :x-mailer; bh=qzEUWvgFl6dMS0eOKlI+rX/+VzE9ikq/fXWrMTGEQ5E=; b=lX9qYiB4a6aOR0DUDWhYF72jgKKRI0kGbc5fmRKE4Ibm1vgqGAzxV/DEZpZZhK1xxO TZlKZAEM7rGYOwjcNh/6/XZF/X/C+femQ1/PptcdJNELrQw2FqAlE3dOO9KmG4vw0nFm nE+2tXtvcoMLOHCVTC4i1dvNVBUzbFQQYZ1H4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:x-received:sender:from:to:cc:subject:date:message-id :x-mailer:x-gm-message-state; bh=qzEUWvgFl6dMS0eOKlI+rX/+VzE9ikq/fXWrMTGEQ5E=; b=bHg1uCQXeJKB5MOvLLSmJf+e7+sDlx0cijGG6wJeazWZAycKIXvzg0vmHRvzSutyXf qLYYuT9UbZRZb5wSgbHH+jABc/1AXf1DDGddRfChGZbVlXRyZY4rO50t8plVfRMhCHG6 oh2G6jzh93g73iHPRraPCtlOk14PZwgWM0eVhqCfytBlxgmczoVN8T0h416kU05+saD0 N3rJf2b4lTzSYT6RaxdsXDXVB5nw4h1Oo/3n7/y4nVY4I1CfUdaSgHzHMjc0xssJoy4G tBSQHEOwKP5dyeM0k0cdhdg/zijf4v33+THEpPy6q+fwWgRQpPCJQUZW0vrwfVSdDMit dqLA== X-Received: by 10.68.228.134 with SMTP id si6mr4328019pbc.24.1366129807859; Tue, 16 Apr 2013 09:30:07 -0700 (PDT) X-Received: by 10.68.228.134 with SMTP id si6mr4328005pbc.24.1366129807770; Tue, 16 Apr 2013 09:30:07 -0700 (PDT) Received: from roland-t410s.purestorage.com ([216.200.155.2]) by mx.google.com with ESMTPS id ux10sm3314016pab.1.2013.04.16.09.30.05 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 16 Apr 2013 09:30:06 -0700 (PDT) From: Roland Dreier To: Markus Stockhausen , linux-rdma@vger.kernel.org Cc: Eric Dumazet Subject: [RFC/PATCH v4] IPoIB: Leave space in skb linear buffer for IP headers Date: Tue, 16 Apr 2013 09:30:01 -0700 Message-Id: <1366129801-29234-1-git-send-email-roland@kernel.org> X-Mailer: git-send-email 1.8.1.2 X-Gm-Message-State: ALoCoQk0FLB4+FoiZC26UBMdJRif4otfbS18QS8Avq7+7RWNAMXqS/+ALub3UYxZ8FVuHFN+A8jSS5D70bSXc0Jarz2wmXzrHc6iqyzlRu3tSirhGUFtGcTT/betxjZTzSNo51+P/lIx/VfuTRgYj0OYVtZ7QhAtCCQprbcQDqBcjlFw77Z8zLQ= Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Roland Dreier Markus Stockhausen noticed that IPoIB was spending significant time doing memcpy() in __pskb_pull_tail(). He found that this is because his adapter reports a maximum MTU of 4K, which causes IPoIB datagram mode to receive all the actual data in a separate page in the fragment list. We're already allocating extra tailroom for the skb linear part, so we might as well use it. In fact, we might as well allocate a big enough linear part so that all the data fits there in the relatively common case of a 2K IB MTU, and only use a fragment page for 4K IB MTU. Cc: Eric Dumazet Reported-by: Markus Stockhausen Signed-off-by: Roland Dreier --- v4: Leave enough space in linear part of skb so that all data ends up there with 2K IB MTU. Still not sure how this affects perf with a 4K IB MTU (should be better since we avoid pulling IP headers out of first fragment). drivers/infiniband/ulp/ipoib/ipoib.h | 6 ++- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 78 +++++++++++++++++++-------------- 2 files changed, 49 insertions(+), 35 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index eb71aaa..7a56a8e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -64,8 +64,9 @@ enum ipoib_flush_level { enum { IPOIB_ENCAP_LEN = 4, - IPOIB_UD_HEAD_SIZE = IB_GRH_BYTES + IPOIB_ENCAP_LEN, - IPOIB_UD_RX_SG = 2, /* max buffer needed for 4K mtu */ + /* enough so w/ 2K mtu, everything is in linear part of skb */ + IPOIB_UD_HEAD_SIZE = IB_GRH_BYTES + 2048, + IPOIB_UD_RX_SG = 2, /* max buffer needed for 4K mtu w/ 4K pages */ IPOIB_CM_MTU = 0x10000 - 0x10, /* padding to align header to 16 */ IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, @@ -155,6 +156,7 @@ struct ipoib_mcast { struct ipoib_rx_buf { struct sk_buff *skb; + struct page *page; u64 mapping[IPOIB_UD_RX_SG]; }; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 2cfa76f..88a4ea3 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -92,13 +92,15 @@ void ipoib_free_ah(struct kref *kref) } static void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv, + struct page *page, u64 mapping[IPOIB_UD_RX_SG]) { if (ipoib_ud_need_sg(priv->max_ib_mtu)) { ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); - ib_dma_unmap_page(priv->ca, mapping[1], PAGE_SIZE, - DMA_FROM_DEVICE); + if (page) + ib_dma_unmap_page(priv->ca, mapping[1], PAGE_SIZE, + DMA_FROM_DEVICE); } else ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_BUF_SIZE(priv->max_ib_mtu), @@ -107,23 +109,18 @@ static void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv, static void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv, struct sk_buff *skb, + struct page *page, unsigned int length) { - if (ipoib_ud_need_sg(priv->max_ib_mtu)) { - skb_frag_t *frag = &skb_shinfo(skb)->frags[0]; - unsigned int size; + if (ipoib_ud_need_sg(priv->max_ib_mtu) && + length > IPOIB_UD_HEAD_SIZE) { /* - * There is only two buffers needed for max_payload = 4K, + * There are only two buffers needed for max_payload = 4K, * first buf size is IPOIB_UD_HEAD_SIZE */ - skb->tail += IPOIB_UD_HEAD_SIZE; - skb->len += length; - - size = length - IPOIB_UD_HEAD_SIZE; - - skb_frag_size_set(frag, size); - skb->data_len += size; - skb->truesize += PAGE_SIZE; + skb_put(skb, IPOIB_UD_HEAD_SIZE); + skb_add_rx_frag(skb, 0, page, 0, + length - IPOIB_UD_HEAD_SIZE, PAGE_SIZE); } else skb_put(skb, length); @@ -143,9 +140,11 @@ static int ipoib_ib_post_receive(struct net_device *dev, int id) ret = ib_post_recv(priv->qp, &priv->rx_wr, &bad_wr); if (unlikely(ret)) { ipoib_warn(priv, "receive failed for buf %d (%d)\n", id, ret); - ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[id].mapping); + ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[id].page, priv->rx_ring[id].mapping); dev_kfree_skb_any(priv->rx_ring[id].skb); priv->rx_ring[id].skb = NULL; + put_page(priv->rx_ring[id].page); + priv->rx_ring[id].page = NULL; } return ret; @@ -156,18 +155,13 @@ static struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev, int id) struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *skb; int buf_size; - int tailroom; u64 *mapping; + struct page **page; - if (ipoib_ud_need_sg(priv->max_ib_mtu)) { - buf_size = IPOIB_UD_HEAD_SIZE; - tailroom = 128; /* reserve some tailroom for IP/TCP headers */ - } else { - buf_size = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu); - tailroom = 0; - } + buf_size = ipoib_ud_need_sg(priv->max_ib_mtu) ? + IPOIB_UD_HEAD_SIZE : IPOIB_UD_BUF_SIZE(priv->max_ib_mtu); - skb = dev_alloc_skb(buf_size + tailroom + 4); + skb = dev_alloc_skb(buf_size + 4); if (unlikely(!skb)) return NULL; @@ -184,21 +178,24 @@ static struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev, int id) if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) goto error; - if (ipoib_ud_need_sg(priv->max_ib_mtu)) { - struct page *page = alloc_page(GFP_ATOMIC); - if (!page) + page = &priv->rx_ring[id].page; + if (ipoib_ud_need_sg(priv->max_ib_mtu) && !*page) { + *page = alloc_page(GFP_ATOMIC); + if (!*page) goto partial_error; - skb_fill_page_desc(skb, 0, page, 0, PAGE_SIZE); mapping[1] = - ib_dma_map_page(priv->ca, page, + ib_dma_map_page(priv->ca, *page, 0, PAGE_SIZE, DMA_FROM_DEVICE); if (unlikely(ib_dma_mapping_error(priv->ca, mapping[1]))) - goto partial_error; + goto map_error; } priv->rx_ring[id].skb = skb; return skb; +map_error: + put_page(*page); + *page = NULL; partial_error: ib_dma_unmap_single(priv->ca, mapping[0], buf_size, DMA_FROM_DEVICE); error: @@ -230,6 +227,7 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) struct ipoib_dev_priv *priv = netdev_priv(dev); unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV; struct sk_buff *skb; + struct page *page; u64 mapping[IPOIB_UD_RX_SG]; union ib_gid *dgid; @@ -249,9 +247,11 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) ipoib_warn(priv, "failed recv event " "(status=%d, wrid=%d vend_err %x)\n", wc->status, wr_id, wc->vendor_err); - ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[wr_id].mapping); + ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[wr_id].page, priv->rx_ring[wr_id].mapping); dev_kfree_skb_any(skb); priv->rx_ring[wr_id].skb = NULL; + put_page(priv->rx_ring[wr_id].page); + priv->rx_ring[wr_id].page = NULL; return; } @@ -265,20 +265,29 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) memcpy(mapping, priv->rx_ring[wr_id].mapping, IPOIB_UD_RX_SG * sizeof *mapping); + if (wc->byte_len > IPOIB_UD_HEAD_SIZE) { + page = priv->rx_ring[wr_id].page; + priv->rx_ring[wr_id].page = NULL; + } else { + page = NULL; + } + /* * If we can't allocate a new RX buffer, dump * this packet and reuse the old buffer. */ if (unlikely(!ipoib_alloc_rx_skb(dev, wr_id))) { ++dev->stats.rx_dropped; + if (page) + priv->rx_ring[wr_id].page = page; goto repost; } ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); - ipoib_ud_dma_unmap_rx(priv, mapping); - ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); + ipoib_ud_dma_unmap_rx(priv, page, mapping); + ipoib_ud_skb_put_frags(priv, skb, page, wc->byte_len); /* First byte of dgid signals multicast when 0xff */ dgid = &((struct ib_grh *)skb->data)->dgid; @@ -861,9 +870,12 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) if (!rx_req->skb) continue; ipoib_ud_dma_unmap_rx(priv, + priv->rx_ring[i].page, priv->rx_ring[i].mapping); dev_kfree_skb_any(rx_req->skb); rx_req->skb = NULL; + put_page(priv->rx_ring[i].page); + priv->rx_ring[i].page = NULL; } goto timeout;