From patchwork Fri Jul 22 13:17:29 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Campbell X-Patchwork-Id: 999182 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p6MDIFjw021810 for ; Fri, 22 Jul 2011 13:18:16 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754378Ab1GVNSG (ORCPT ); Fri, 22 Jul 2011 09:18:06 -0400 Received: from smtp.citrix.com ([66.165.176.89]:23859 "EHLO SMTP.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754382Ab1GVNSA (ORCPT ); Fri, 22 Jul 2011 09:18:00 -0400 X-IronPort-AV: E=Sophos;i="4.67,247,1309752000"; d="scan'208";a="15215528" Received: from ftlpmailmx02.citrite.net ([10.13.107.66]) by FTLPIPO01.CITRIX.COM with ESMTP/TLS/RC4-MD5; 22 Jul 2011 09:17:59 -0400 Received: from smtp01.ad.xensource.com (10.219.128.104) by smtprelay.citrix.com (10.13.107.66) with Microsoft SMTP Server id 8.3.137.0; Fri, 22 Jul 2011 09:17:59 -0400 Received: from cosworth.uk.xensource.com (cosworth.uk.xensource.com [10.80.16.52]) by smtp01.ad.xensource.com (8.13.1/8.13.1) with ESMTP id p6MDHYcA017339; Fri, 22 Jul 2011 06:17:55 -0700 From: Ian Campbell To: netdev@vger.kernel.org, linux-nfs@vger.kernel.org CC: Ian Campbell , "David S. Miller" , "James E.J. Bottomley" , Dimitris Michailidis , Casey Leedom , Yevgeny Petrilin , Eric Dumazet , =?UTF-8?q?Micha=C5=82=20Miros=C5=82aw?= , linux-scsi@vger.kernel.org Subject: [PATCH 09/13] net: add support for per-paged-fragment destructors Date: Fri, 22 Jul 2011 14:17:29 +0100 Message-ID: <1311340653-19336-9-git-send-email-ian.campbell@citrix.com> X-Mailer: git-send-email 1.7.2.5 In-Reply-To: <1311340095.12772.57.camel@zakaz.uk.xensource.com> References: <1311340095.12772.57.camel@zakaz.uk.xensource.com> MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Fri, 22 Jul 2011 13:18:16 +0000 (UTC) differently. No references are held by the network stack on the struct page (it is up to the caller to manage this as necessary) instead the network stack will track references via the count embedded in the destructor structure. When this reference count reaches zero then the destructor will be called and the caller can take the necesary steps to release the page (i.e. release the struct page reference itself). The intention is that callers can use this callback to delay completion to _their_ callers until the network stack has completely released the page, in order to prevent use-after-free or modification of data pages which are still in use by the stack. It is allowable (indeed expected) for a caller to share a single destructor instance between multiple pages injected into the stack e.g. a group of pages included in a single higher level operation might share a destructor which is used to complete that higher level operation. NB: a small number of drivers use skb_frag_t independently of struct sk_buff so this patch is slightly larger than necessary. I did consider leaving skb_frag_t alone and defining a new (but similar) structure to be used in the struct sk_buff itself. This would also have the advantage of more clearly separating the two uses, which is useful since there are now special reference counting accessors for skb_frag_t within a struct sk_buff but not (necessarily) for those used outside of an skb. Signed-off-by: Ian Campbell Cc: "David S. Miller" Cc: "James E.J. Bottomley" Cc: Dimitris Michailidis Cc: Casey Leedom Cc: Yevgeny Petrilin Cc: Eric Dumazet Cc: "Micha? Miros?aw" Cc: netdev@vger.kernel.org Cc: linux-scsi@vger.kernel.org --- drivers/net/cxgb4/sge.c | 14 +++++++------- drivers/net/cxgb4vf/sge.c | 18 +++++++++--------- drivers/net/mlx4/en_rx.c | 2 +- drivers/scsi/cxgbi/libcxgbi.c | 2 +- include/linux/skbuff.h | 31 ++++++++++++++++++++++++++----- net/core/skbuff.c | 17 +++++++++++++++++ 6 files changed, 61 insertions(+), 23 deletions(-) diff --git a/drivers/net/cxgb4/sge.c b/drivers/net/cxgb4/sge.c index f1813b5..3e7c4b3 100644 --- a/drivers/net/cxgb4/sge.c +++ b/drivers/net/cxgb4/sge.c @@ -1416,7 +1416,7 @@ static inline void copy_frags(struct sk_buff *skb, unsigned int n; /* usually there's just one frag */ - skb_frag_set_page(skb, 0, gl->frags[0].page); + skb_frag_set_page(skb, 0, gl->frags[0].page.p); /* XXX */ ssi->frags[0].page_offset = gl->frags[0].page_offset + offset; ssi->frags[0].size = gl->frags[0].size - offset; ssi->nr_frags = gl->nfrags; @@ -1425,7 +1425,7 @@ static inline void copy_frags(struct sk_buff *skb, memcpy(&ssi->frags[1], &gl->frags[1], n * sizeof(skb_frag_t)); /* get a reference to the last page, we don't own it */ - get_page(gl->frags[n].page); + get_page(gl->frags[n].page.p); /* XXX */ } /** @@ -1482,7 +1482,7 @@ static void t4_pktgl_free(const struct pkt_gl *gl) const skb_frag_t *p; for (p = gl->frags, n = gl->nfrags - 1; n--; p++) - put_page(p->page); + put_page(p->page.p); /* XXX */ } /* @@ -1635,7 +1635,7 @@ static void restore_rx_bufs(const struct pkt_gl *si, struct sge_fl *q, else q->cidx--; d = &q->sdesc[q->cidx]; - d->page = si->frags[frags].page; + d->page = si->frags[frags].page.p; /* XXX */ d->dma_addr |= RX_UNMAPPED_BUF; q->avail++; } @@ -1717,7 +1717,7 @@ static int process_responses(struct sge_rspq *q, int budget) for (frags = 0, fp = si.frags; ; frags++, fp++) { rsd = &rxq->fl.sdesc[rxq->fl.cidx]; bufsz = get_buf_size(rsd); - fp->page = rsd->page; + fp->page.p = rsd->page; /* XXX */ fp->page_offset = q->offset; fp->size = min(bufsz, len); len -= fp->size; @@ -1734,8 +1734,8 @@ static int process_responses(struct sge_rspq *q, int budget) get_buf_addr(rsd), fp->size, DMA_FROM_DEVICE); - si.va = page_address(si.frags[0].page) + - si.frags[0].page_offset; + si.va = page_address(si.frags[0].page.p) + + si.frags[0].page_offset; /* XXX */ prefetch(si.va); diff --git a/drivers/net/cxgb4vf/sge.c b/drivers/net/cxgb4vf/sge.c index f4c4480..0a0dda1 100644 --- a/drivers/net/cxgb4vf/sge.c +++ b/drivers/net/cxgb4vf/sge.c @@ -1397,7 +1397,7 @@ struct sk_buff *t4vf_pktgl_to_skb(const struct pkt_gl *gl, skb_copy_to_linear_data(skb, gl->va, pull_len); ssi = skb_shinfo(skb); - skb_frag_set_page(skb, 0, gl->frags[0].page); + skb_frag_set_page(skb, 0, gl->frags[0].page.p); /* XXX */ ssi->frags[0].page_offset = gl->frags[0].page_offset + pull_len; ssi->frags[0].size = gl->frags[0].size - pull_len; if (gl->nfrags > 1) @@ -1410,7 +1410,7 @@ struct sk_buff *t4vf_pktgl_to_skb(const struct pkt_gl *gl, skb->truesize += skb->data_len; /* Get a reference for the last page, we don't own it */ - get_page(gl->frags[gl->nfrags - 1].page); + get_page(gl->frags[gl->nfrags - 1].page.p); /* XXX */ } out: @@ -1430,7 +1430,7 @@ void t4vf_pktgl_free(const struct pkt_gl *gl) frag = gl->nfrags - 1; while (frag--) - put_page(gl->frags[frag].page); + put_page(gl->frags[frag].page.p); /* XXX */ } /** @@ -1450,7 +1450,7 @@ static inline void copy_frags(struct sk_buff *skb, unsigned int n; /* usually there's just one frag */ - skb_frag_set_page(skb, 0, gl->frags[0].page); + skb_frag_set_page(skb, 0, gl->frags[0].page.p); /* XXX */ si->frags[0].page_offset = gl->frags[0].page_offset + offset; si->frags[0].size = gl->frags[0].size - offset; si->nr_frags = gl->nfrags; @@ -1460,7 +1460,7 @@ static inline void copy_frags(struct sk_buff *skb, memcpy(&si->frags[1], &gl->frags[1], n * sizeof(skb_frag_t)); /* get a reference to the last page, we don't own it */ - get_page(gl->frags[n].page); + get_page(gl->frags[n].page.p); /* XXX */ } /** @@ -1633,7 +1633,7 @@ static void restore_rx_bufs(const struct pkt_gl *gl, struct sge_fl *fl, else fl->cidx--; sdesc = &fl->sdesc[fl->cidx]; - sdesc->page = gl->frags[frags].page; + sdesc->page = gl->frags[frags].page.p; /* XXX */ sdesc->dma_addr |= RX_UNMAPPED_BUF; fl->avail++; } @@ -1721,7 +1721,7 @@ int process_responses(struct sge_rspq *rspq, int budget) BUG_ON(rxq->fl.avail == 0); sdesc = &rxq->fl.sdesc[rxq->fl.cidx]; bufsz = get_buf_size(sdesc); - fp->page = sdesc->page; + fp->page.p = sdesc->page; /* XXX */ fp->page_offset = rspq->offset; fp->size = min(bufsz, len); len -= fp->size; @@ -1739,8 +1739,8 @@ int process_responses(struct sge_rspq *rspq, int budget) dma_sync_single_for_cpu(rspq->adapter->pdev_dev, get_buf_addr(sdesc), fp->size, DMA_FROM_DEVICE); - gl.va = (page_address(gl.frags[0].page) + - gl.frags[0].page_offset); + gl.va = (page_address(gl.frags[0].page.p) + + gl.frags[0].page_offset); /* XXX */ prefetch(gl.va); /* diff --git a/drivers/net/mlx4/en_rx.c b/drivers/net/mlx4/en_rx.c index 21a89e0..c5d01ce 100644 --- a/drivers/net/mlx4/en_rx.c +++ b/drivers/net/mlx4/en_rx.c @@ -418,7 +418,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv, break; /* Save page reference in skb */ - __skb_frag_set_page(&skb_frags_rx[nr], skb_frags[nr].page); + __skb_frag_set_page(&skb_frags_rx[nr], skb_frags[nr].page.p); /* XXX */ skb_frags_rx[nr].size = skb_frags[nr].size; skb_frags_rx[nr].page_offset = skb_frags[nr].page_offset; dma = be64_to_cpu(rx_desc->data[nr].addr); diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c index 949ee48..8d16a74 100644 --- a/drivers/scsi/cxgbi/libcxgbi.c +++ b/drivers/scsi/cxgbi/libcxgbi.c @@ -1823,7 +1823,7 @@ static int sgl_read_to_frags(struct scatterlist *sg, unsigned int sgoffset, return -EINVAL; } - frags[i].page = page; + frags[i].page.p = page; frags[i].page_offset = sg->offset + sgoffset; frags[i].size = copy; i++; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index bc6bd24..9818fe2 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -135,8 +135,17 @@ struct sk_buff; typedef struct skb_frag_struct skb_frag_t; +struct skb_frag_destructor { + atomic_t ref; + int (*destroy)(void *data); + void *data; +}; + struct skb_frag_struct { - struct page *page; + struct { + struct page *p; + struct skb_frag_destructor *destructor; + } page; #if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536) __u32 page_offset; __u32 size; @@ -1129,7 +1138,8 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, { skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - frag->page = page; + frag->page.p = page; + frag->page.destructor = NULL; frag->page_offset = off; frag->size = size; } @@ -1648,7 +1658,7 @@ static inline void netdev_free_page(struct net_device *dev, struct page *page) */ static inline struct page *__skb_frag_page(const skb_frag_t *frag) { - return frag->page; + return frag->page.p; } /** @@ -1659,9 +1669,12 @@ static inline struct page *__skb_frag_page(const skb_frag_t *frag) */ static inline const struct page *skb_frag_page(const skb_frag_t *frag) { - return frag->page; + return frag->page.p; } +extern void skb_frag_destructor_ref(struct skb_frag_destructor *destroy); +extern void skb_frag_destructor_unref(struct skb_frag_destructor *destroy); + /** * __skb_frag_ref - take an addition reference on a paged fragment. * @frag: the paged fragment @@ -1670,6 +1683,10 @@ static inline const struct page *skb_frag_page(const skb_frag_t *frag) */ static inline void __skb_frag_ref(skb_frag_t *frag) { + if (unlikely(frag->page.destructor)) { + skb_frag_destructor_ref(frag->page.destructor); + return; + } get_page(__skb_frag_page(frag)); } @@ -1693,6 +1710,10 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f) */ static inline void __skb_frag_unref(skb_frag_t *frag) { + if (unlikely(frag->page.destructor)) { + skb_frag_destructor_unref(frag->page.destructor); + return; + } put_page(__skb_frag_page(frag)); } @@ -1745,7 +1766,7 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag) */ static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page) { - frag->page = page; + frag->page.p = page; __skb_frag_ref(frag); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 2133600..bdc6f6e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -292,6 +292,23 @@ struct sk_buff *dev_alloc_skb(unsigned int length) } EXPORT_SYMBOL(dev_alloc_skb); +void skb_frag_destructor_ref(struct skb_frag_destructor *destroy) +{ + BUG_ON(destroy == NULL); + atomic_inc(&destroy->ref); +} +EXPORT_SYMBOL(skb_frag_destructor_ref); + +void skb_frag_destructor_unref(struct skb_frag_destructor *destroy) +{ + if (destroy == NULL) + return; + + if (atomic_dec_and_test(&destroy->ref)) + destroy->destroy(destroy->data); +} +EXPORT_SYMBOL(skb_frag_destructor_unref); + static void skb_drop_list(struct sk_buff **listp) { struct sk_buff *list = *listp;