From patchwork Thu Aug 2 19:56:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Battersby X-Patchwork-Id: 10554089 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2DC4014E2 for ; Thu, 2 Aug 2018 19:56:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 259B32C180 for ; Thu, 2 Aug 2018 19:56:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1715A2C192; Thu, 2 Aug 2018 19:56:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B44B22C180 for ; Thu, 2 Aug 2018 19:56:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727067AbeHBVtc (ORCPT ); Thu, 2 Aug 2018 17:49:32 -0400 Received: from mail.cybernetics.com ([173.71.130.66]:39472 "EHLO mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726828AbeHBVtc (ORCPT ); Thu, 2 Aug 2018 17:49:32 -0400 X-ASG-Debug-ID: 1533239812-0fb3b01fb33f5920001-ziuLRu Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with ESMTP id ZSuFBJBdpErXbbdy (version=SSLv3 cipher=DES-CBC3-SHA bits=112 verify=NO); Thu, 02 Aug 2018 15:56:53 -0400 (EDT) X-Barracuda-Envelope-From: tonyb@cybernetics.com X-ASG-Whitelist: Client Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1]) by cybernetics.com (CommuniGate Pro SMTP 5.1.14) with ESMTPSA id 8317830; Thu, 02 Aug 2018 15:56:52 -0400 From: Tony Battersby Subject: [PATCH v2 1/9] dmapool: fix boundary comparison To: Matthew Wilcox , Christoph Hellwig , Marek Szyprowski , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , iommu@lists.linux-foundation.org, linux-mm , linux-scsi , MPT-FusionLinux.pdl@broadcom.com X-ASG-Orig-Subj: [PATCH v2 1/9] dmapool: fix boundary comparison Message-ID: Date: Thu, 2 Aug 2018 15:56:52 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Barracuda-Connect: UNKNOWN[10.157.1.126] X-Barracuda-Start-Time: 1533239813 X-Barracuda-Encrypted: DES-CBC3-SHA X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 1910 X-Virus-Scanned: by bsmtpd at cybernetics.com X-Barracuda-BRTS-Status: 1 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Fix the boundary comparison when constructing the list of free blocks for the case that 'size' is a power of two. Since 'boundary' is also a power of two, that would make 'boundary' a multiple of 'size', in which case a single block would never cross the boundary. This bug would cause some of the allocated memory to be wasted (but not leaked). Example: size = 512 boundary = 2048 allocation = 4096 Address range 0 - 511 512 - 1023 1024 - 1535 1536 - 2047 * 2048 - 2559 2560 - 3071 3072 - 3583 3584 - 4095 * Prior to this fix, the address ranges marked with "*" would not have been used even though they didn't cross the given boundary. Fixes: e34f44b3517f ("pool: Improve memory usage for devices which can't cross boundaries") Signed-off-by: Tony Battersby --- As part of developing a later patch in the series ("dmapool: reduce footprint in struct page"), I wrote a standalone program that iterates over all the combinations of PAGE_SIZE, 'size', and 'boundary', and performs a series of consistency checks on the math in some new functions, and it turned up this bug. With this change, all the consistency checks pass. So I am fairly confident that this change doesn't break other combinations of parameters. Even though I described this as a "fix", it does not seem important enough to Cc: stable from a strict reading of the stable kernel rules. IOW, it is not "bothering" anyone. --- linux/mm/dmapool.c.orig 2018-08-01 17:57:04.000000000 -0400 +++ linux/mm/dmapool.c 2018-08-01 17:57:16.000000000 -0400 @@ -210,7 +210,7 @@ static void pool_initialise_page(struct do { unsigned int next = offset + pool->size; - if (unlikely((next + pool->size) >= next_boundary)) { + if (unlikely((next + pool->size) > next_boundary)) { next = next_boundary; next_boundary += pool->boundary; } From patchwork Thu Aug 2 19:57:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Battersby X-Patchwork-Id: 10554093 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E356514E2 for ; Thu, 2 Aug 2018 19:57:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC7EB2C180 for ; Thu, 2 Aug 2018 19:57:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D03962C192; Thu, 2 Aug 2018 19:57:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBBFC2C180 for ; Thu, 2 Aug 2018 19:57:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727111AbeHBVuI (ORCPT ); Thu, 2 Aug 2018 17:50:08 -0400 Received: from mail.cybernetics.com ([173.71.130.66]:39542 "EHLO mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726828AbeHBVuI (ORCPT ); Thu, 2 Aug 2018 17:50:08 -0400 X-ASG-Debug-ID: 1533239849-0fb3b01fb33f5930001-ziuLRu Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with ESMTP id ClQEcugQvUhx27fP (version=SSLv3 cipher=DES-CBC3-SHA bits=112 verify=NO); Thu, 02 Aug 2018 15:57:29 -0400 (EDT) X-Barracuda-Envelope-From: tonyb@cybernetics.com X-ASG-Whitelist: Client Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1]) by cybernetics.com (CommuniGate Pro SMTP 5.1.14) with ESMTPSA id 8317831; Thu, 02 Aug 2018 15:57:28 -0400 From: Tony Battersby Subject: [PATCH v2 2/9] dmapool: cleanup error messages To: Matthew Wilcox , Christoph Hellwig , Marek Szyprowski , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , iommu@lists.linux-foundation.org, linux-mm , linux-scsi@vger.kernel.org, MPT-FusionLinux.pdl@broadcom.com X-ASG-Orig-Subj: [PATCH v2 2/9] dmapool: cleanup error messages Message-ID: Date: Thu, 2 Aug 2018 15:57:28 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Barracuda-Connect: UNKNOWN[10.157.1.126] X-Barracuda-Start-Time: 1533239849 X-Barracuda-Encrypted: DES-CBC3-SHA X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 3366 X-Virus-Scanned: by bsmtpd at cybernetics.com X-Barracuda-BRTS-Status: 1 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Remove code duplication in error messages. It is now safe to pas a NULL dev to dev_err(), so the checks to avoid doing so are no longer necessary. Example: Error message with dev != NULL: mpt3sas 0000:02:00.0: dma_pool_destroy chain pool, (____ptrval____) busy Same error message with dev == NULL before patch: dma_pool_destroy chain pool, (____ptrval____) busy Same error message with dev == NULL after patch: (NULL device *): dma_pool_destroy chain pool, (____ptrval____) busy Signed-off-by: Tony Battersby --- linux/mm/dmapool.c.orig 2018-08-02 09:54:25.000000000 -0400 +++ linux/mm/dmapool.c 2018-08-02 09:57:58.000000000 -0400 @@ -289,13 +289,9 @@ void dma_pool_destroy(struct dma_pool *p page = list_entry(pool->page_list.next, struct dma_page, page_list); if (is_page_busy(page)) { - if (pool->dev) - dev_err(pool->dev, - "dma_pool_destroy %s, %p busy\n", - pool->name, page->vaddr); - else - pr_err("dma_pool_destroy %s, %p busy\n", - pool->name, page->vaddr); + dev_err(pool->dev, + "dma_pool_destroy %s, %p busy\n", + pool->name, page->vaddr); /* leak the still-in-use consistent memory */ list_del(&page->page_list); kfree(page); @@ -357,13 +353,9 @@ void *dma_pool_alloc(struct dma_pool *po for (i = sizeof(page->offset); i < pool->size; i++) { if (data[i] == POOL_POISON_FREED) continue; - if (pool->dev) - dev_err(pool->dev, - "dma_pool_alloc %s, %p (corrupted)\n", - pool->name, retval); - else - pr_err("dma_pool_alloc %s, %p (corrupted)\n", - pool->name, retval); + dev_err(pool->dev, + "dma_pool_alloc %s, %p (corrupted)\n", + pool->name, retval); /* * Dump the first 4 bytes even if they are not @@ -418,13 +410,9 @@ void dma_pool_free(struct dma_pool *pool page = pool_find_page(pool, dma); if (!page) { spin_unlock_irqrestore(&pool->lock, flags); - if (pool->dev) - dev_err(pool->dev, - "dma_pool_free %s, %p/%lx (bad dma)\n", - pool->name, vaddr, (unsigned long)dma); - else - pr_err("dma_pool_free %s, %p/%lx (bad dma)\n", - pool->name, vaddr, (unsigned long)dma); + dev_err(pool->dev, + "dma_pool_free %s, %p/%lx (bad dma)\n", + pool->name, vaddr, (unsigned long)dma); return; } @@ -432,13 +420,9 @@ void dma_pool_free(struct dma_pool *pool #ifdef DMAPOOL_DEBUG if ((dma - page->dma) != offset) { spin_unlock_irqrestore(&pool->lock, flags); - if (pool->dev) - dev_err(pool->dev, - "dma_pool_free %s, %p (bad vaddr)/%pad\n", - pool->name, vaddr, &dma); - else - pr_err("dma_pool_free %s, %p (bad vaddr)/%pad\n", - pool->name, vaddr, &dma); + dev_err(pool->dev, + "dma_pool_free %s, %p (bad vaddr)/%pad\n", + pool->name, vaddr, &dma); return; } { @@ -449,12 +433,9 @@ void dma_pool_free(struct dma_pool *pool continue; } spin_unlock_irqrestore(&pool->lock, flags); - if (pool->dev) - dev_err(pool->dev, "dma_pool_free %s, dma %pad already free\n", - pool->name, &dma); - else - pr_err("dma_pool_free %s, dma %pad already free\n", - pool->name, &dma); + dev_err(pool->dev, + "dma_pool_free %s, dma %pad already free\n", + pool->name, &dma); return; } } From patchwork Thu Aug 2 19:58:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Battersby X-Patchwork-Id: 10554095 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1083714E2 for ; Thu, 2 Aug 2018 19:58:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 080DE2AAA1 for ; Thu, 2 Aug 2018 19:58:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EB8F72B2B4; Thu, 2 Aug 2018 19:58:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 957452AAA1 for ; Thu, 2 Aug 2018 19:58:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727349AbeHBVuo (ORCPT ); Thu, 2 Aug 2018 17:50:44 -0400 Received: from mail.cybernetics.com ([173.71.130.66]:39618 "EHLO mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727085AbeHBVuo (ORCPT ); Thu, 2 Aug 2018 17:50:44 -0400 X-ASG-Debug-ID: 1533239884-0fb3b01fb33f5950001-ziuLRu Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with ESMTP id a0tb9ZvUiGIb8fnH (version=SSLv3 cipher=DES-CBC3-SHA bits=112 verify=NO); Thu, 02 Aug 2018 15:58:04 -0400 (EDT) X-Barracuda-Envelope-From: tonyb@cybernetics.com X-ASG-Whitelist: Client Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1]) by cybernetics.com (CommuniGate Pro SMTP 5.1.14) with ESMTPSA id 8317835; Thu, 02 Aug 2018 15:58:04 -0400 From: Tony Battersby Subject: [PATCH v2 3/9] dmapool: cleanup dma_pool_destroy To: Matthew Wilcox , Christoph Hellwig , Marek Szyprowski , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , iommu@lists.linux-foundation.org, linux-mm , linux-scsi@vger.kernel.org, MPT-FusionLinux.pdl@broadcom.com X-ASG-Orig-Subj: [PATCH v2 3/9] dmapool: cleanup dma_pool_destroy Message-ID: <924a1e83-285a-a258-ed45-ad035411bd83@cybernetics.com> Date: Thu, 2 Aug 2018 15:58:04 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Barracuda-Connect: UNKNOWN[10.157.1.126] X-Barracuda-Start-Time: 1533239884 X-Barracuda-Encrypted: DES-CBC3-SHA X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 2073 X-Virus-Scanned: by bsmtpd at cybernetics.com X-Barracuda-BRTS-Status: 1 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Remove a small amount of code duplication between dma_pool_destroy() and pool_free_page() in preparation for adding more code without having to duplicate it. No functional changes. Signed-off-by: Tony Battersby --- linux/mm/dmapool.c.orig 2018-08-02 09:59:15.000000000 -0400 +++ linux/mm/dmapool.c 2018-08-02 10:01:26.000000000 -0400 @@ -249,13 +249,22 @@ static inline bool is_page_busy(struct d static void pool_free_page(struct dma_pool *pool, struct dma_page *page) { + void *vaddr = page->vaddr; dma_addr_t dma = page->dma; + list_del(&page->page_list); + + if (is_page_busy(page)) { + dev_err(pool->dev, + "dma_pool_destroy %s, %p busy\n", + pool->name, vaddr); + /* leak the still-in-use consistent memory */ + } else { #ifdef DMAPOOL_DEBUG - memset(page->vaddr, POOL_POISON_FREED, pool->allocation); + memset(vaddr, POOL_POISON_FREED, pool->allocation); #endif - dma_free_coherent(pool->dev, pool->allocation, page->vaddr, dma); - list_del(&page->page_list); + dma_free_coherent(pool->dev, pool->allocation, vaddr, dma); + } kfree(page); } @@ -269,6 +278,7 @@ static void pool_free_page(struct dma_po */ void dma_pool_destroy(struct dma_pool *pool) { + struct dma_page *page; bool empty = false; if (unlikely(!pool)) @@ -284,19 +294,10 @@ void dma_pool_destroy(struct dma_pool *p device_remove_file(pool->dev, &dev_attr_pools); mutex_unlock(&pools_reg_lock); - while (!list_empty(&pool->page_list)) { - struct dma_page *page; - page = list_entry(pool->page_list.next, - struct dma_page, page_list); - if (is_page_busy(page)) { - dev_err(pool->dev, - "dma_pool_destroy %s, %p busy\n", - pool->name, page->vaddr); - /* leak the still-in-use consistent memory */ - list_del(&page->page_list); - kfree(page); - } else - pool_free_page(pool, page); + while ((page = list_first_entry_or_null(&pool->page_list, + struct dma_page, + page_list))) { + pool_free_page(pool, page); } kfree(pool); From patchwork Thu Aug 2 19:58:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Battersby X-Patchwork-Id: 10554099 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4D241708 for ; Thu, 2 Aug 2018 19:58:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D3FE2C07C for ; Thu, 2 Aug 2018 19:58:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90EF92C085; Thu, 2 Aug 2018 19:58:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE0212C07C for ; Thu, 2 Aug 2018 19:58:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727349AbeHBVvU (ORCPT ); Thu, 2 Aug 2018 17:51:20 -0400 Received: from mail.cybernetics.com ([173.71.130.66]:39692 "EHLO mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726828AbeHBVvU (ORCPT ); Thu, 2 Aug 2018 17:51:20 -0400 X-ASG-Debug-ID: 1533239921-0fb3b01fb33f5990001-ziuLRu Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with ESMTP id mAhDNUnZILFnEUKx (version=SSLv3 cipher=DES-CBC3-SHA bits=112 verify=NO); Thu, 02 Aug 2018 15:58:41 -0400 (EDT) X-Barracuda-Envelope-From: tonyb@cybernetics.com X-ASG-Whitelist: Client Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1]) by cybernetics.com (CommuniGate Pro SMTP 5.1.14) with ESMTPSA id 8317833; Thu, 02 Aug 2018 15:58:40 -0400 From: Tony Battersby Subject: [PATCH v2 4/9] dmapool: improve scalability of dma_pool_alloc To: Matthew Wilcox , Christoph Hellwig , Marek Szyprowski , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , iommu@lists.linux-foundation.org, linux-mm , linux-scsi , MPT-FusionLinux.pdl@broadcom.com X-ASG-Orig-Subj: [PATCH v2 4/9] dmapool: improve scalability of dma_pool_alloc Message-ID: <1dbe6204-17fc-efd9-2381-48186cae2b94@cybernetics.com> Date: Thu, 2 Aug 2018 15:58:40 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Barracuda-Connect: UNKNOWN[10.157.1.126] X-Barracuda-Start-Time: 1533239921 X-Barracuda-Encrypted: DES-CBC3-SHA X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 7247 X-Virus-Scanned: by bsmtpd at cybernetics.com X-Barracuda-BRTS-Status: 1 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP dma_pool_alloc() scales poorly when allocating a large number of pages because it does a linear scan of all previously-allocated pages before allocating a new one. Improve its scalability by maintaining a separate list of pages that have free blocks ready to (re)allocate. In big O notation, this improves the algorithm from O(n^2) to O(n). Signed-off-by: Tony Battersby --- Changes since v1: *) In v1, there was one (original) list for all pages and one (new) list for pages with free blocks. In v2, there is one list for pages with free blocks and one list for pages without free blocks, and pages are moved back and forth between the two lists. This is to avoid bloating struct dma_page with extra list pointers, which is important so that a later patch can move its fields into struct page. *) Use list_first_entry_or_null instead of !list_empty/list_first_entry. Note that pool_find_page() will be removed entirely by a later patch, so the extra code there won't stay for long. --- linux/mm/dmapool.c.orig 2018-08-02 10:01:26.000000000 -0400 +++ linux/mm/dmapool.c 2018-08-02 10:03:46.000000000 -0400 @@ -15,11 +15,16 @@ * Many older drivers still have their own code to do this. * * The current design of this allocator is fairly simple. The pool is - * represented by the 'struct dma_pool' which keeps a doubly-linked list of - * allocated pages. Each page in the page_list is split into blocks of at - * least 'size' bytes. Free blocks are tracked in an unsorted singly-linked - * list of free blocks within the page. Used blocks aren't tracked, but we - * keep a count of how many are currently allocated from each page. + * represented by the 'struct dma_pool'. Each allocated page is split into + * blocks of at least 'size' bytes. Free blocks are tracked in an unsorted + * singly-linked list of free blocks within the page. Used blocks aren't + * tracked, but we keep a count of how many are currently allocated from each + * page. + * + * The pool keeps two doubly-linked list of allocated pages. The 'available' + * list tracks pages that have one or more free blocks, and the 'full' list + * tracks pages that have no free blocks. Pages are moved from one list to + * the other as their blocks are allocated and freed. */ #include @@ -43,7 +48,10 @@ #endif struct dma_pool { /* the pool */ - struct list_head page_list; +#define POOL_FULL_IDX 0 +#define POOL_AVAIL_IDX 1 +#define POOL_N_LISTS 2 + struct list_head page_list[POOL_N_LISTS]; spinlock_t lock; size_t size; struct device *dev; @@ -54,7 +62,7 @@ struct dma_pool { /* the pool */ }; struct dma_page { /* cacheable header for 'allocation' bytes */ - struct list_head page_list; + struct list_head dma_list; void *vaddr; dma_addr_t dma; unsigned int in_use; @@ -70,7 +78,6 @@ show_pools(struct device *dev, struct de unsigned temp; unsigned size; char *next; - struct dma_page *page; struct dma_pool *pool; next = buf; @@ -84,11 +91,18 @@ show_pools(struct device *dev, struct de list_for_each_entry(pool, &dev->dma_pools, pools) { unsigned pages = 0; unsigned blocks = 0; + int list_idx; spin_lock_irq(&pool->lock); - list_for_each_entry(page, &pool->page_list, page_list) { - pages++; - blocks += page->in_use; + for (list_idx = 0; list_idx < POOL_N_LISTS; list_idx++) { + struct dma_page *page; + + list_for_each_entry(page, + &pool->page_list[list_idx], + dma_list) { + pages++; + blocks += page->in_use; + } } spin_unlock_irq(&pool->lock); @@ -163,7 +177,8 @@ struct dma_pool *dma_pool_create(const c retval->dev = dev; - INIT_LIST_HEAD(&retval->page_list); + INIT_LIST_HEAD(&retval->page_list[0]); + INIT_LIST_HEAD(&retval->page_list[1]); spin_lock_init(&retval->lock); retval->size = size; retval->boundary = boundary; @@ -252,7 +267,7 @@ static void pool_free_page(struct dma_po void *vaddr = page->vaddr; dma_addr_t dma = page->dma; - list_del(&page->page_list); + list_del(&page->dma_list); if (is_page_busy(page)) { dev_err(pool->dev, @@ -278,8 +293,8 @@ static void pool_free_page(struct dma_po */ void dma_pool_destroy(struct dma_pool *pool) { - struct dma_page *page; bool empty = false; + int list_idx; if (unlikely(!pool)) return; @@ -294,10 +309,15 @@ void dma_pool_destroy(struct dma_pool *p device_remove_file(pool->dev, &dev_attr_pools); mutex_unlock(&pools_reg_lock); - while ((page = list_first_entry_or_null(&pool->page_list, - struct dma_page, - page_list))) { - pool_free_page(pool, page); + for (list_idx = 0; list_idx < POOL_N_LISTS; list_idx++) { + struct dma_page *page; + + while ((page = list_first_entry_or_null( + &pool->page_list[list_idx], + struct dma_page, + dma_list))) { + pool_free_page(pool, page); + } } kfree(pool); @@ -325,10 +345,11 @@ void *dma_pool_alloc(struct dma_pool *po might_sleep_if(gfpflags_allow_blocking(mem_flags)); spin_lock_irqsave(&pool->lock, flags); - list_for_each_entry(page, &pool->page_list, page_list) { - if (page->offset < pool->allocation) - goto ready; - } + page = list_first_entry_or_null(&pool->page_list[POOL_AVAIL_IDX], + struct dma_page, + dma_list); + if (page) + goto ready; /* pool_alloc_page() might sleep, so temporarily drop &pool->lock */ spin_unlock_irqrestore(&pool->lock, flags); @@ -339,11 +360,16 @@ void *dma_pool_alloc(struct dma_pool *po spin_lock_irqsave(&pool->lock, flags); - list_add(&page->page_list, &pool->page_list); + list_add(&page->dma_list, &pool->page_list[POOL_AVAIL_IDX]); ready: page->in_use++; offset = page->offset; page->offset = *(int *)(page->vaddr + offset); + if (page->offset >= pool->allocation) { + /* Move page from the "available" list to the "full" list. */ + list_del(&page->dma_list); + list_add(&page->dma_list, &pool->page_list[POOL_FULL_IDX]); + } retval = offset + page->vaddr; *handle = offset + page->dma; #ifdef DMAPOOL_DEBUG @@ -381,13 +407,19 @@ EXPORT_SYMBOL(dma_pool_alloc); static struct dma_page *pool_find_page(struct dma_pool *pool, dma_addr_t dma) { - struct dma_page *page; + int list_idx; + + for (list_idx = 0; list_idx < POOL_N_LISTS; list_idx++) { + struct dma_page *page; - list_for_each_entry(page, &pool->page_list, page_list) { - if (dma < page->dma) - continue; - if ((dma - page->dma) < pool->allocation) - return page; + list_for_each_entry(page, + &pool->page_list[list_idx], + dma_list) { + if (dma < page->dma) + continue; + if ((dma - page->dma) < pool->allocation) + return page; + } } return NULL; } @@ -444,6 +476,11 @@ void dma_pool_free(struct dma_pool *pool #endif page->in_use--; + if (page->offset >= pool->allocation) { + /* Move page from the "full" list to the "available" list. */ + list_del(&page->dma_list); + list_add(&page->dma_list, &pool->page_list[POOL_AVAIL_IDX]); + } *(int *)vaddr = page->offset; page->offset = offset; /* From patchwork Thu Aug 2 19:59:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Battersby X-Patchwork-Id: 10554103 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DC67914E2 for ; Thu, 2 Aug 2018 19:59:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4EDC2C07C for ; Thu, 2 Aug 2018 19:59:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C71832C085; Thu, 2 Aug 2018 19:59:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64EB62C07C for ; Thu, 2 Aug 2018 19:59:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727660AbeHBVvz (ORCPT ); Thu, 2 Aug 2018 17:51:55 -0400 Received: from mail.cybernetics.com ([173.71.130.66]:39768 "EHLO mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726828AbeHBVvz (ORCPT ); Thu, 2 Aug 2018 17:51:55 -0400 X-ASG-Debug-ID: 1533239955-0fb3b01fb33f59b0001-ziuLRu Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with ESMTP id k9ypWt4peWnGoYEU (version=SSLv3 cipher=DES-CBC3-SHA bits=112 verify=NO); Thu, 02 Aug 2018 15:59:15 -0400 (EDT) X-Barracuda-Envelope-From: tonyb@cybernetics.com X-ASG-Whitelist: Client Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1]) by cybernetics.com (CommuniGate Pro SMTP 5.1.14) with ESMTPSA id 8317839; Thu, 02 Aug 2018 15:59:15 -0400 From: Tony Battersby Subject: [PATCH v2 5/9] dmapool: rename fields in dma_page To: Matthew Wilcox , Christoph Hellwig , Marek Szyprowski , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , iommu@lists.linux-foundation.org, linux-mm@kvack.org, linux-scsi@vger.kernel.org, MPT-FusionLinux.pdl@broadcom.com X-ASG-Orig-Subj: [PATCH v2 5/9] dmapool: rename fields in dma_page Message-ID: Date: Thu, 2 Aug 2018 15:59:15 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Barracuda-Connect: UNKNOWN[10.157.1.126] X-Barracuda-Start-Time: 1533239955 X-Barracuda-Encrypted: DES-CBC3-SHA X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 3378 X-Virus-Scanned: by bsmtpd at cybernetics.com X-Barracuda-BRTS-Status: 1 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Rename fields in 'struct dma_page' in preparation for moving them into 'struct page'. No functional changes. in_use -> dma_in_use offset -> dma_free_o Signed-off-by: Tony Battersby --- linux/mm/dmapool.c.orig 2018-08-02 10:03:46.000000000 -0400 +++ linux/mm/dmapool.c 2018-08-02 10:06:32.000000000 -0400 @@ -65,8 +65,8 @@ struct dma_page { /* cacheable header f struct list_head dma_list; void *vaddr; dma_addr_t dma; - unsigned int in_use; - unsigned int offset; + unsigned int dma_in_use; + unsigned int dma_free_o; }; static DEFINE_MUTEX(pools_lock); @@ -101,7 +101,7 @@ show_pools(struct device *dev, struct de &pool->page_list[list_idx], dma_list) { pages++; - blocks += page->in_use; + blocks += page->dma_in_use; } } spin_unlock_irq(&pool->lock); @@ -248,8 +248,8 @@ static struct dma_page *pool_alloc_page( memset(page->vaddr, POOL_POISON_FREED, pool->allocation); #endif pool_initialise_page(pool, page); - page->in_use = 0; - page->offset = 0; + page->dma_in_use = 0; + page->dma_free_o = 0; } else { kfree(page); page = NULL; @@ -259,7 +259,7 @@ static struct dma_page *pool_alloc_page( static inline bool is_page_busy(struct dma_page *page) { - return page->in_use != 0; + return page->dma_in_use != 0; } static void pool_free_page(struct dma_pool *pool, struct dma_page *page) @@ -362,10 +362,10 @@ void *dma_pool_alloc(struct dma_pool *po list_add(&page->dma_list, &pool->page_list[POOL_AVAIL_IDX]); ready: - page->in_use++; - offset = page->offset; - page->offset = *(int *)(page->vaddr + offset); - if (page->offset >= pool->allocation) { + page->dma_in_use++; + offset = page->dma_free_o; + page->dma_free_o = *(int *)(page->vaddr + offset); + if (page->dma_free_o >= pool->allocation) { /* Move page from the "available" list to the "full" list. */ list_del(&page->dma_list); list_add(&page->dma_list, &pool->page_list[POOL_FULL_IDX]); @@ -376,8 +376,8 @@ void *dma_pool_alloc(struct dma_pool *po { int i; u8 *data = retval; - /* page->offset is stored in first 4 bytes */ - for (i = sizeof(page->offset); i < pool->size; i++) { + /* page->dma_free_o is stored in first 4 bytes */ + for (i = sizeof(page->dma_free_o); i < pool->size; i++) { if (data[i] == POOL_POISON_FREED) continue; dev_err(pool->dev, @@ -459,7 +459,7 @@ void dma_pool_free(struct dma_pool *pool return; } { - unsigned int chain = page->offset; + unsigned int chain = page->dma_free_o; while (chain < pool->allocation) { if (chain != offset) { chain = *(int *)(page->vaddr + chain); @@ -475,14 +475,14 @@ void dma_pool_free(struct dma_pool *pool memset(vaddr, POOL_POISON_FREED, pool->size); #endif - page->in_use--; - if (page->offset >= pool->allocation) { + page->dma_in_use--; + if (page->dma_free_o >= pool->allocation) { /* Move page from the "full" list to the "available" list. */ list_del(&page->dma_list); list_add(&page->dma_list, &pool->page_list[POOL_AVAIL_IDX]); } - *(int *)vaddr = page->offset; - page->offset = offset; + *(int *)vaddr = page->dma_free_o; + page->dma_free_o = offset; /* * Resist a temptation to do * if (!is_page_busy(page)) pool_free_page(pool, page); From patchwork Thu Aug 2 19:59:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Battersby X-Patchwork-Id: 10554107 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9D501708 for ; Thu, 2 Aug 2018 19:59:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B24592C086 for ; Thu, 2 Aug 2018 19:59:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A50292C08F; Thu, 2 Aug 2018 19:59:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E4E132C086 for ; Thu, 2 Aug 2018 19:59:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727091AbeHBVwg (ORCPT ); Thu, 2 Aug 2018 17:52:36 -0400 Received: from mail.cybernetics.com ([173.71.130.66]:39844 "EHLO mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726828AbeHBVwg (ORCPT ); Thu, 2 Aug 2018 17:52:36 -0400 X-ASG-Debug-ID: 1533239994-0fb3b01fb33f59e0001-ziuLRu Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with ESMTP id X3Aga30FNfhmhCsL (version=SSLv3 cipher=DES-CBC3-SHA bits=112 verify=NO); Thu, 02 Aug 2018 15:59:54 -0400 (EDT) X-Barracuda-Envelope-From: tonyb@cybernetics.com X-ASG-Whitelist: Client Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1]) by cybernetics.com (CommuniGate Pro SMTP 5.1.14) with ESMTPSA id 8317837; Thu, 02 Aug 2018 15:59:53 -0400 From: Tony Battersby Subject: [PATCH v2 6/9] dmapool: improve scalability of dma_pool_free To: Matthew Wilcox , Christoph Hellwig , Marek Szyprowski , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , iommu@lists.linux-foundation.org, linux-mm@kvack.org, linux-scsi@vger.kernel.org, MPT-FusionLinux.pdl@broadcom.com X-ASG-Orig-Subj: [PATCH v2 6/9] dmapool: improve scalability of dma_pool_free Message-ID: Date: Thu, 2 Aug 2018 15:59:53 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Barracuda-Connect: UNKNOWN[10.157.1.126] X-Barracuda-Start-Time: 1533239994 X-Barracuda-Encrypted: DES-CBC3-SHA X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 9688 X-Barracuda-BRTS-Status: 1 X-Virus-Scanned: by bsmtpd at cybernetics.com Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP dma_pool_free() scales poorly when the pool contains many pages because pool_find_page() does a linear scan of all allocated pages. Improve its scalability by replacing the linear scan with virt_to_page() and storing dmapool private data directly in 'struct page', thereby eliminating 'struct dma_page'. In big O notation, this improves the algorithm from O(n^2) to O(n) while also reducing memory usage. Thanks to Matthew Wilcox for the suggestion to use struct page. Signed-off-by: Tony Battersby --- Completely rewritten since v1. Prior to this patch, if you passed dma_pool_free() a bad dma address, then pool_find_page() wouldn't be able to find it in the pool, so it would print an error and return. But this patch removes pool_find_page(), so I moved one of the faster sanity checks from DMAPOOL_DEBUG to always-enabled. It should be cheap enough, especially given the speed improvement this patch set gives overall. The check will at least verify that the page was probably allocated by a dma pool (by checking that page->dma is consistent with the passed-in dma address), although it can't verify that it was the same pool that is being passed to dma_pool_free(). I would have liked to add a pointer from the 'struct page' back to the 'struct dma_pool', but there isn't enough space in 'struct page' without going through painful measures that aren't worth it for a debug check. --- linux/include/linux/mm_types.h.orig 2018-08-01 17:59:46.000000000 -0400 +++ linux/include/linux/mm_types.h 2018-08-01 17:59:56.000000000 -0400 @@ -153,6 +153,12 @@ struct page { unsigned long _zd_pad_1; /* uses mapping */ }; + struct { /* dma_pool pages */ + struct list_head dma_list; + dma_addr_t dma; + unsigned int dma_free_o; + }; + /** @rcu_head: You can use this to free a page by RCU. */ struct rcu_head rcu_head; }; @@ -174,6 +180,8 @@ struct page { unsigned int active; /* SLAB */ int units; /* SLOB */ + + unsigned int dma_in_use; /* dma_pool pages */ }; /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ --- linux/mm/dmapool.c.orig 2018-08-02 10:07:47.000000000 -0400 +++ linux/mm/dmapool.c 2018-08-02 10:10:38.000000000 -0400 @@ -25,6 +25,10 @@ * list tracks pages that have one or more free blocks, and the 'full' list * tracks pages that have no free blocks. Pages are moved from one list to * the other as their blocks are allocated and freed. + * + * When allocating DMA pages, we use some available space in 'struct page' to + * store data private to dmapool; search 'dma_pool' in the definition of + * 'struct page' for details. */ #include @@ -61,14 +65,6 @@ struct dma_pool { /* the pool */ struct list_head pools; }; -struct dma_page { /* cacheable header for 'allocation' bytes */ - struct list_head dma_list; - void *vaddr; - dma_addr_t dma; - unsigned int dma_in_use; - unsigned int dma_free_o; -}; - static DEFINE_MUTEX(pools_lock); static DEFINE_MUTEX(pools_reg_lock); @@ -95,7 +91,7 @@ show_pools(struct device *dev, struct de spin_lock_irq(&pool->lock); for (list_idx = 0; list_idx < POOL_N_LISTS; list_idx++) { - struct dma_page *page; + struct page *page; list_for_each_entry(page, &pool->page_list[list_idx], @@ -218,7 +214,7 @@ struct dma_pool *dma_pool_create(const c } EXPORT_SYMBOL(dma_pool_create); -static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page) +static void pool_initialize_free_block_list(struct dma_pool *pool, void *vaddr) { unsigned int offset = 0; unsigned int next_boundary = pool->boundary; @@ -229,47 +225,57 @@ static void pool_initialise_page(struct next = next_boundary; next_boundary += pool->boundary; } - *(int *)(page->vaddr + offset) = next; + *(int *)(vaddr + offset) = next; offset = next; } while (offset < pool->allocation); } -static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags) +static struct page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags) { - struct dma_page *page; + struct page *page; + dma_addr_t dma; + void *vaddr; - page = kmalloc(sizeof(*page), mem_flags); - if (!page) + vaddr = dma_alloc_coherent(pool->dev, pool->allocation, &dma, + mem_flags); + if (!vaddr) return NULL; - page->vaddr = dma_alloc_coherent(pool->dev, pool->allocation, - &page->dma, mem_flags); - if (page->vaddr) { + #ifdef DMAPOOL_DEBUG - memset(page->vaddr, POOL_POISON_FREED, pool->allocation); + memset(vaddr, POOL_POISON_FREED, pool->allocation); #endif - pool_initialise_page(pool, page); - page->dma_in_use = 0; - page->dma_free_o = 0; - } else { - kfree(page); - page = NULL; - } + pool_initialize_free_block_list(pool, vaddr); + + page = virt_to_page(vaddr); + page->dma = dma; + page->dma_free_o = 0; + page->dma_in_use = 0; + return page; } -static inline bool is_page_busy(struct dma_page *page) +static inline bool is_page_busy(struct page *page) { return page->dma_in_use != 0; } -static void pool_free_page(struct dma_pool *pool, struct dma_page *page) +static void pool_free_page(struct dma_pool *pool, struct page *page) { - void *vaddr = page->vaddr; + /* Save local copies of some page fields. */ + void *vaddr = page_to_virt(page); + bool busy = is_page_busy(page); dma_addr_t dma = page->dma; list_del(&page->dma_list); - if (is_page_busy(page)) { + /* Clear all the page fields we use. */ + page->dma_list.next = NULL; + page->dma_list.prev = NULL; + page->dma = 0; + page->dma_free_o = 0; + page_mapcount_reset(page); /* clear dma_in_use */ + + if (busy) { dev_err(pool->dev, "dma_pool_destroy %s, %p busy\n", pool->name, vaddr); @@ -280,7 +286,6 @@ static void pool_free_page(struct dma_po #endif dma_free_coherent(pool->dev, pool->allocation, vaddr, dma); } - kfree(page); } /** @@ -310,11 +315,11 @@ void dma_pool_destroy(struct dma_pool *p mutex_unlock(&pools_reg_lock); for (list_idx = 0; list_idx < POOL_N_LISTS; list_idx++) { - struct dma_page *page; + struct page *page; while ((page = list_first_entry_or_null( &pool->page_list[list_idx], - struct dma_page, + struct page, dma_list))) { pool_free_page(pool, page); } @@ -338,15 +343,16 @@ void *dma_pool_alloc(struct dma_pool *po dma_addr_t *handle) { unsigned long flags; - struct dma_page *page; + struct page *page; size_t offset; void *retval; + void *vaddr; might_sleep_if(gfpflags_allow_blocking(mem_flags)); spin_lock_irqsave(&pool->lock, flags); page = list_first_entry_or_null(&pool->page_list[POOL_AVAIL_IDX], - struct dma_page, + struct page, dma_list); if (page) goto ready; @@ -362,15 +368,16 @@ void *dma_pool_alloc(struct dma_pool *po list_add(&page->dma_list, &pool->page_list[POOL_AVAIL_IDX]); ready: + vaddr = page_to_virt(page); page->dma_in_use++; offset = page->dma_free_o; - page->dma_free_o = *(int *)(page->vaddr + offset); + page->dma_free_o = *(int *)(vaddr + offset); if (page->dma_free_o >= pool->allocation) { /* Move page from the "available" list to the "full" list. */ list_del(&page->dma_list); list_add(&page->dma_list, &pool->page_list[POOL_FULL_IDX]); } - retval = offset + page->vaddr; + retval = offset + vaddr; *handle = offset + page->dma; #ifdef DMAPOOL_DEBUG { @@ -405,25 +412,6 @@ void *dma_pool_alloc(struct dma_pool *po } EXPORT_SYMBOL(dma_pool_alloc); -static struct dma_page *pool_find_page(struct dma_pool *pool, dma_addr_t dma) -{ - int list_idx; - - for (list_idx = 0; list_idx < POOL_N_LISTS; list_idx++) { - struct dma_page *page; - - list_for_each_entry(page, - &pool->page_list[list_idx], - dma_list) { - if (dma < page->dma) - continue; - if ((dma - page->dma) < pool->allocation) - return page; - } - } - return NULL; -} - /** * dma_pool_free - put block back into dma pool * @pool: the dma pool holding the block @@ -435,34 +423,35 @@ static struct dma_page *pool_find_page(s */ void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma) { - struct dma_page *page; + struct page *page; unsigned long flags; unsigned int offset; - spin_lock_irqsave(&pool->lock, flags); - page = pool_find_page(pool, dma); - if (!page) { - spin_unlock_irqrestore(&pool->lock, flags); + if (unlikely(!virt_addr_valid(vaddr))) { dev_err(pool->dev, - "dma_pool_free %s, %p/%lx (bad dma)\n", - pool->name, vaddr, (unsigned long)dma); + "dma_pool_free %s, %p (bad vaddr)/%pad\n", + pool->name, vaddr, &dma); return; } - offset = vaddr - page->vaddr; -#ifdef DMAPOOL_DEBUG - if ((dma - page->dma) != offset) { - spin_unlock_irqrestore(&pool->lock, flags); + page = virt_to_page(vaddr); + offset = offset_in_page(vaddr); + + if (unlikely((dma - page->dma) != offset)) { dev_err(pool->dev, - "dma_pool_free %s, %p (bad vaddr)/%pad\n", + "dma_pool_free %s, %p (bad vaddr)/%pad (or bad dma)\n", pool->name, vaddr, &dma); return; } + + spin_lock_irqsave(&pool->lock, flags); +#ifdef DMAPOOL_DEBUG { + void *page_vaddr = vaddr - offset; unsigned int chain = page->dma_free_o; while (chain < pool->allocation) { if (chain != offset) { - chain = *(int *)(page->vaddr + chain); + chain = *(int *)(page_vaddr + chain); continue; } spin_unlock_irqrestore(&pool->lock, flags); From patchwork Thu Aug 2 20:00:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Battersby X-Patchwork-Id: 10554121 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31EA514E2 for ; Thu, 2 Aug 2018 20:00:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2997A2C306 for ; Thu, 2 Aug 2018 20:00:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DA022C332; Thu, 2 Aug 2018 20:00:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B286D2C33E for ; Thu, 2 Aug 2018 20:00:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731865AbeHBVxS (ORCPT ); Thu, 2 Aug 2018 17:53:18 -0400 Received: from mail.cybernetics.com ([173.71.130.66]:39936 "EHLO mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726828AbeHBVxS (ORCPT ); Thu, 2 Aug 2018 17:53:18 -0400 X-ASG-Debug-ID: 1533240037-0fb3b01fb33f5a20001-ziuLRu Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with ESMTP id AwJ773DfukFnPLt9 (version=SSLv3 cipher=DES-CBC3-SHA bits=112 verify=NO); Thu, 02 Aug 2018 16:00:37 -0400 (EDT) X-Barracuda-Envelope-From: tonyb@cybernetics.com X-ASG-Whitelist: Client Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1]) by cybernetics.com (CommuniGate Pro SMTP 5.1.14) with ESMTPSA id 8317844; Thu, 02 Aug 2018 16:00:37 -0400 From: Tony Battersby Subject: [PATCH v2 7/9] dmapool: debug: prevent endless loop in case of corruption To: Matthew Wilcox , Christoph Hellwig , Marek Szyprowski , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , iommu@lists.linux-foundation.org, linux-mm@kvack.org, linux-scsi@vger.kernel.org, MPT-FusionLinux.pdl@broadcom.com X-ASG-Orig-Subj: [PATCH v2 7/9] dmapool: debug: prevent endless loop in case of corruption Message-ID: <36e483e9-d779-497a-551e-32f96e184b49@cybernetics.com> Date: Thu, 2 Aug 2018 16:00:37 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Barracuda-Connect: UNKNOWN[10.157.1.126] X-Barracuda-Start-Time: 1533240037 X-Barracuda-Encrypted: DES-CBC3-SHA X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 1676 X-Virus-Scanned: by bsmtpd at cybernetics.com X-Barracuda-BRTS-Status: 1 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Prevent a possible endless loop with DMAPOOL_DEBUG enabled if a buggy driver corrupts DMA pool memory. Signed-off-by: Tony Battersby --- linux/mm/dmapool.c.orig 2018-08-02 10:14:25.000000000 -0400 +++ linux/mm/dmapool.c 2018-08-02 10:16:17.000000000 -0400 @@ -449,16 +449,35 @@ void dma_pool_free(struct dma_pool *pool { void *page_vaddr = vaddr - offset; unsigned int chain = page->dma_free_o; + size_t total_free = 0; + while (chain < pool->allocation) { - if (chain != offset) { - chain = *(int *)(page_vaddr + chain); - continue; + if (unlikely(chain == offset)) { + spin_unlock_irqrestore(&pool->lock, flags); + dev_err(pool->dev, + "dma_pool_free %s, dma %pad already free\n", + pool->name, &dma); + return; + } + + /* + * The calculation of the number of blocks per + * allocation is actually more complicated than this + * because of the boundary value. But this comparison + * does not need to be exact; it just needs to prevent + * an endless loop in case a buggy driver causes a + * circular loop in the freelist. + */ + total_free += pool->size; + if (unlikely(total_free >= pool->allocation)) { + spin_unlock_irqrestore(&pool->lock, flags); + dev_err(pool->dev, + "dma_pool_free %s, freelist corrupted\n", + pool->name); + return; } - spin_unlock_irqrestore(&pool->lock, flags); - dev_err(pool->dev, - "dma_pool_free %s, dma %pad already free\n", - pool->name, &dma); - return; + + chain = *(int *)(page_vaddr + chain); } } memset(vaddr, POOL_POISON_FREED, pool->size); From patchwork Thu Aug 2 20:01:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Battersby X-Patchwork-Id: 10554143 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F73D13BF for ; Thu, 2 Aug 2018 20:01:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64FB72C256 for ; Thu, 2 Aug 2018 20:01:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 62DC52C472; Thu, 2 Aug 2018 20:01:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32A522C48D for ; Thu, 2 Aug 2018 20:01:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731955AbeHBVxx (ORCPT ); Thu, 2 Aug 2018 17:53:53 -0400 Received: from mail.cybernetics.com ([173.71.130.66]:40012 "EHLO mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731948AbeHBVxx (ORCPT ); Thu, 2 Aug 2018 17:53:53 -0400 X-ASG-Debug-ID: 1533240072-0fb3b01fb33f5a40001-ziuLRu Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with ESMTP id sRuXKKDmzkiURATG (version=SSLv3 cipher=DES-CBC3-SHA bits=112 verify=NO); Thu, 02 Aug 2018 16:01:12 -0400 (EDT) X-Barracuda-Envelope-From: tonyb@cybernetics.com X-ASG-Whitelist: Client Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1]) by cybernetics.com (CommuniGate Pro SMTP 5.1.14) with ESMTPSA id 8317849; Thu, 02 Aug 2018 16:01:12 -0400 From: Tony Battersby Subject: [PATCH v2 8/9] dmapool: reduce footprint in struct page To: Matthew Wilcox , Christoph Hellwig , Marek Szyprowski , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , iommu@lists.linux-foundation.org, linux-mm@kvack.org, linux-scsi@vger.kernel.org, MPT-FusionLinux.pdl@broadcom.com X-ASG-Orig-Subj: [PATCH v2 8/9] dmapool: reduce footprint in struct page Message-ID: <0ccfd31b-0a3f-9ae8-85c8-e176cd5453a9@cybernetics.com> Date: Thu, 2 Aug 2018 16:01:12 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Barracuda-Connect: UNKNOWN[10.157.1.126] X-Barracuda-Start-Time: 1533240072 X-Barracuda-Encrypted: DES-CBC3-SHA X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 13516 X-Virus-Scanned: by bsmtpd at cybernetics.com X-Barracuda-BRTS-Status: 1 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is my attempt to shrink 'dma_free_o' and 'dma_in_use' in 'struct page' (originally 'offset' and 'in_use' in 'struct dma_page') to 16-bit so that it is unnecessary to use the '_mapcount' field of 'struct page'. However, it adds complexity and makes allocating and freeing up to 20% slower for little gain, so I am NOT recommending that it be merged at this time. I am posting it just for reference in case someone finds it useful in the future. The main difficulty is supporting archs that have PAGE_SIZE > 64 KiB, for which a 16-bit byte offset is insufficient to cover the entire page. So I took the approach of converting everything from a "byte offset" into a "block index". That way the code can split any PAGE_SIZE into as many as 65535 blocks (one 16-bit index value is reserved for the list terminator). For example, with PAGE_SIZE of 1 MiB, you get 65535 blocks for 'size' <= 16. But that introduces a lot of ugly math due to the 'boundary' checking, which makes the code slower and more complex. I wrote a standalone program that iterates over all the combinations of PAGE_SIZE, 'size', and 'boundary', and performs a series of consistency checks on pool_blk_idx_to_offset(), pool_offset_to_blk_idx(), and pool_initialize_free_block_list(). The math may be ugly but I am pretty sure it is correct. One of the nice things about this is that dma_pool_free() can do some additional sanity checks: *) Check that the offset of the passed-in address corresponds to a valid block offset. *) With DMAPOOL_DEBUG enabled, check that the number of blocks in the freelist exactly matches the number that should be there. This improves the debug check I added in a previous patch by adding the calculation for pool->blks_per_alloc. NOT for merging. --- linux/include/linux/mm_types.h.orig 2018-08-01 12:25:25.000000000 -0400 +++ linux/include/linux/mm_types.h 2018-08-01 12:25:52.000000000 -0400 @@ -156,7 +156,8 @@ struct page { struct { /* dma_pool pages */ struct list_head dma_list; dma_addr_t dma; - unsigned int dma_free_o; + unsigned short dma_free_idx; + unsigned short dma_in_use; }; /** @rcu_head: You can use this to free a page by RCU. */ @@ -180,8 +181,6 @@ struct page { unsigned int active; /* SLAB */ int units; /* SLOB */ - - unsigned int dma_in_use; /* dma_pool pages */ }; /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ --- linux/mm/dmapool.c.orig 2018-08-02 14:02:42.000000000 -0400 +++ linux/mm/dmapool.c 2018-08-02 14:03:31.000000000 -0400 @@ -51,16 +51,25 @@ #define DMAPOOL_DEBUG 1 #endif +/* + * This matches the type of struct page::dma_free_idx, which is 16-bit to + * conserve space in struct page. + */ +typedef unsigned short pool_idx_t; +#define POOL_IDX_MAX USHRT_MAX + struct dma_pool { /* the pool */ #define POOL_FULL_IDX 0 #define POOL_AVAIL_IDX 1 #define POOL_N_LISTS 2 struct list_head page_list[POOL_N_LISTS]; spinlock_t lock; - size_t size; struct device *dev; - size_t allocation; - size_t boundary; + unsigned int size; + unsigned int allocation; + unsigned int boundary_shift; + unsigned int blks_per_boundary; + unsigned int blks_per_alloc; char name[32]; struct list_head pools; }; @@ -103,9 +112,9 @@ show_pools(struct device *dev, struct de spin_unlock_irq(&pool->lock); /* per-pool info, no real statistics yet */ - temp = scnprintf(next, size, "%-16s %4u %4zu %4zu %2u\n", + temp = scnprintf(next, size, "%-16s %4u %4u %4u %2u\n", pool->name, blocks, - pages * (pool->allocation / pool->size), + pages * pool->blks_per_alloc, pool->size, pages); size -= temp; next += temp; @@ -141,6 +150,7 @@ static DEVICE_ATTR(pools, 0444, show_pool struct dma_pool *dma_pool_create(const char *name, struct device *dev, size_t size, size_t align, size_t boundary) { + unsigned int boundary_shift; struct dma_pool *retval; size_t allocation; bool empty = false; @@ -150,10 +160,10 @@ struct dma_pool *dma_pool_create(const c else if (align & (align - 1)) return NULL; - if (size == 0) + if (size == 0 || size > SZ_2G) return NULL; - else if (size < 4) - size = 4; + else if (size < sizeof(pool_idx_t)) + size = sizeof(pool_idx_t); if ((size % align) != 0) size = ALIGN(size, align); @@ -165,6 +175,9 @@ struct dma_pool *dma_pool_create(const c else if ((boundary < size) || (boundary & (boundary - 1))) return NULL; + boundary_shift = get_count_order_long(min(boundary, allocation)); + boundary = 1U << boundary_shift; + retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, dev_to_node(dev)); if (!retval) return retval; @@ -177,8 +190,29 @@ struct dma_pool *dma_pool_create(const c INIT_LIST_HEAD(&retval->page_list[1]); spin_lock_init(&retval->lock); retval->size = size; - retval->boundary = boundary; retval->allocation = allocation; + retval->boundary_shift = boundary_shift; + retval->blks_per_boundary = boundary / size; + retval->blks_per_alloc = + (allocation / boundary) * retval->blks_per_boundary + + (allocation % boundary) / size; + if (boundary >= allocation || boundary % size == 0) { + /* + * When the blocks are packed together, an individual block + * will never cross the boundary, so the boundary doesn't + * matter in this case. Enable some faster codepaths that skip + * boundary calculations for a small speedup. + */ + retval->blks_per_boundary = 0; + } + if (retval->blks_per_alloc > POOL_IDX_MAX) { + /* + * This would only affect archs with large PAGE_SIZE. Limit + * the total number of blocks per allocation to avoid + * overflowing dma_in_use and dma_free_idx. + */ + retval->blks_per_alloc = POOL_IDX_MAX; + } INIT_LIST_HEAD(&retval->pools); @@ -214,20 +248,73 @@ struct dma_pool *dma_pool_create(const c } EXPORT_SYMBOL(dma_pool_create); +/* + * Convert the index of a block of size pool->size to its offset within an + * allocated chunk of memory of size pool->allocation. + */ +static unsigned int pool_blk_idx_to_offset(struct dma_pool *pool, + unsigned int blk_idx) +{ + unsigned int offset; + + if (pool->blks_per_boundary == 0) { + offset = blk_idx * pool->size; + } else { + offset = ((blk_idx / pool->blks_per_boundary) << + pool->boundary_shift) + + (blk_idx % pool->blks_per_boundary) * pool->size; + } + return offset; +} + +/* + * Convert an offset within an allocated chunk of memory of size + * pool->allocation to the index of the possibly-smaller block of size + * pool->size. If the given offset is not located at the beginning of a valid + * block, then the return value will be >= pool->blks_per_alloc. + */ +static unsigned int pool_offset_to_blk_idx(struct dma_pool *pool, + unsigned int offset) +{ + unsigned int blk_idx; + + if (pool->blks_per_boundary == 0) { + blk_idx = (likely(offset % pool->size == 0)) + ? (offset / pool->size) + : pool->blks_per_alloc; + } else { + unsigned int offset_within_boundary = + offset & ((1U << pool->boundary_shift) - 1); + unsigned int idx_within_boundary = + offset_within_boundary / pool->size; + + if (likely(offset_within_boundary % pool->size == 0 && + idx_within_boundary < pool->blks_per_boundary)) { + blk_idx = (offset >> pool->boundary_shift) * + pool->blks_per_boundary + + idx_within_boundary; + } else { + blk_idx = pool->blks_per_alloc; + } + } + return blk_idx; +} + static void pool_initialize_free_block_list(struct dma_pool *pool, void *vaddr) { + unsigned int next_boundary = 1U << pool->boundary_shift; unsigned int offset = 0; - unsigned int next_boundary = pool->boundary; + unsigned int i; + + for (i = 0; i < pool->blks_per_alloc; i++) { + *(pool_idx_t *)(vaddr + offset) = (pool_idx_t) i + 1; - do { - unsigned int next = offset + pool->size; - if (unlikely((next + pool->size) > next_boundary)) { - next = next_boundary; - next_boundary += pool->boundary; + offset += pool->size; + if (unlikely((offset + pool->size) > next_boundary)) { + offset = next_boundary; + next_boundary += 1U << pool->boundary_shift; } - *(int *)(vaddr + offset) = next; - offset = next; - } while (offset < pool->allocation); + } } static struct page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags) @@ -248,7 +335,7 @@ static struct page *pool_alloc_page(stru page = virt_to_page(vaddr); page->dma = dma; - page->dma_free_o = 0; + page->dma_free_idx = 0; page->dma_in_use = 0; return page; @@ -272,8 +359,8 @@ static void pool_free_page(struct dma_po page->dma_list.next = NULL; page->dma_list.prev = NULL; page->dma = 0; - page->dma_free_o = 0; - page_mapcount_reset(page); /* clear dma_in_use */ + page->dma_free_idx = 0; + page->dma_in_use = 0; if (busy) { dev_err(pool->dev, @@ -342,9 +429,10 @@ EXPORT_SYMBOL(dma_pool_destroy); void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags, dma_addr_t *handle) { + unsigned int blk_idx; + unsigned int offset; unsigned long flags; struct page *page; - size_t offset; void *retval; void *vaddr; @@ -370,9 +458,10 @@ void *dma_pool_alloc(struct dma_pool *po ready: vaddr = page_to_virt(page); page->dma_in_use++; - offset = page->dma_free_o; - page->dma_free_o = *(int *)(vaddr + offset); - if (page->dma_free_o >= pool->allocation) { + blk_idx = page->dma_free_idx; + offset = pool_blk_idx_to_offset(pool, blk_idx); + page->dma_free_idx = *(pool_idx_t *)(vaddr + offset); + if (page->dma_free_idx >= pool->blks_per_alloc) { /* Move page from the "available" list to the "full" list. */ list_del(&page->dma_list); list_add(&page->dma_list, &pool->page_list[POOL_FULL_IDX]); @@ -383,8 +472,8 @@ void *dma_pool_alloc(struct dma_pool *po { int i; u8 *data = retval; - /* page->dma_free_o is stored in first 4 bytes */ - for (i = sizeof(page->dma_free_o); i < pool->size; i++) { + /* a pool_idx_t is stored at the beginning of the block */ + for (i = sizeof(pool_idx_t); i < pool->size; i++) { if (data[i] == POOL_POISON_FREED) continue; dev_err(pool->dev, @@ -426,6 +515,7 @@ void dma_pool_free(struct dma_pool *pool struct page *page; unsigned long flags; unsigned int offset; + unsigned int blk_idx; if (unlikely(!virt_addr_valid(vaddr))) { dev_err(pool->dev, @@ -438,21 +528,28 @@ void dma_pool_free(struct dma_pool *pool offset = offset_in_page(vaddr); if (unlikely((dma - page->dma) != offset)) { + bad_vaddr: dev_err(pool->dev, "dma_pool_free %s, %p (bad vaddr)/%pad (or bad dma)\n", pool->name, vaddr, &dma); return; } + blk_idx = pool_offset_to_blk_idx(pool, offset); + if (unlikely(blk_idx >= pool->blks_per_alloc)) + goto bad_vaddr; + spin_lock_irqsave(&pool->lock, flags); #ifdef DMAPOOL_DEBUG { void *page_vaddr = vaddr - offset; - unsigned int chain = page->dma_free_o; - size_t total_free = 0; + unsigned int chain_idx = page->dma_free_idx; + unsigned int n_free = 0; + + while (chain_idx < pool->blks_per_alloc) { + unsigned int chain_offset; - while (chain < pool->allocation) { - if (unlikely(chain == offset)) { + if (unlikely(chain_idx == blk_idx)) { spin_unlock_irqrestore(&pool->lock, flags); dev_err(pool->dev, "dma_pool_free %s, dma %pad already free\n", @@ -461,15 +558,15 @@ void dma_pool_free(struct dma_pool *pool } /* - * The calculation of the number of blocks per - * allocation is actually more complicated than this - * because of the boundary value. But this comparison - * does not need to be exact; it just needs to prevent - * an endless loop in case a buggy driver causes a - * circular loop in the freelist. + * A buggy driver could corrupt the freelist by + * use-after-free, buffer overflow, etc. Besides + * checking for corruption, this also prevents an + * endless loop in case corruption causes a circular + * loop in the freelist. */ - total_free += pool->size; - if (unlikely(total_free >= pool->allocation)) { + if (unlikely(++n_free + page->dma_in_use > + pool->blks_per_alloc)) { + freelist_corrupt: spin_unlock_irqrestore(&pool->lock, flags); dev_err(pool->dev, "dma_pool_free %s, freelist corrupted\n", @@ -477,20 +574,24 @@ void dma_pool_free(struct dma_pool *pool return; } - chain = *(int *)(page_vaddr + chain); + chain_offset = pool_blk_idx_to_offset(pool, chain_idx); + chain_idx = + *(pool_idx_t *) (page_vaddr + chain_offset); } + if (n_free + page->dma_in_use != pool->blks_per_alloc) + goto freelist_corrupt; } memset(vaddr, POOL_POISON_FREED, pool->size); #endif page->dma_in_use--; - if (page->dma_free_o >= pool->allocation) { + if (page->dma_free_idx >= pool->blks_per_alloc) { /* Move page from the "full" list to the "available" list. */ list_del(&page->dma_list); list_add(&page->dma_list, &pool->page_list[POOL_AVAIL_IDX]); } - *(int *)vaddr = page->dma_free_o; - page->dma_free_o = offset; + *(pool_idx_t *)vaddr = page->dma_free_idx; + page->dma_free_idx = blk_idx; /* * Resist a temptation to do * if (!is_page_busy(page)) pool_free_page(pool, page); From patchwork Thu Aug 2 20:01:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Battersby X-Patchwork-Id: 10554147 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EBB6114E2 for ; Thu, 2 Aug 2018 20:01:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E0DFD2C098 for ; Thu, 2 Aug 2018 20:01:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D4C902C206; Thu, 2 Aug 2018 20:01:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3804C2C098 for ; Thu, 2 Aug 2018 20:01:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729034AbeHBVy2 (ORCPT ); Thu, 2 Aug 2018 17:54:28 -0400 Received: from mail.cybernetics.com ([173.71.130.66]:40088 "EHLO mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727067AbeHBVy1 (ORCPT ); Thu, 2 Aug 2018 17:54:27 -0400 X-ASG-Debug-ID: 1533240107-0fb3b01fb33f5a60001-ziuLRu Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with ESMTP id DUrgNdEOSvYJ6hE0 (version=SSLv3 cipher=DES-CBC3-SHA bits=112 verify=NO); Thu, 02 Aug 2018 16:01:47 -0400 (EDT) X-Barracuda-Envelope-From: tonyb@cybernetics.com X-ASG-Whitelist: Client Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1]) by cybernetics.com (CommuniGate Pro SMTP 5.1.14) with ESMTPSA id 8317857; Thu, 02 Aug 2018 16:01:47 -0400 From: Tony Battersby Subject: [PATCH v2 9/9] [SCSI] mpt3sas: replace chain_dma_pool To: Matthew Wilcox , Christoph Hellwig , Marek Szyprowski , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , iommu@lists.linux-foundation.org, linux-mm@kvack.org, linux-scsi@vger.kernel.org, MPT-FusionLinux.pdl@broadcom.com X-ASG-Orig-Subj: [PATCH v2 9/9] [SCSI] mpt3sas: replace chain_dma_pool Message-ID: Date: Thu, 2 Aug 2018 16:01:47 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Barracuda-Connect: UNKNOWN[10.157.1.126] X-Barracuda-Start-Time: 1533240107 X-Barracuda-Encrypted: DES-CBC3-SHA X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 7782 X-Virus-Scanned: by bsmtpd at cybernetics.com X-Barracuda-BRTS-Status: 1 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Replace chain_dma_pool with direct calls to dma_alloc_coherent() and dma_free_coherent(). Since the chain lookup can involve hundreds of thousands of allocations, it is worthwile to avoid the overhead of the dma_pool API. Signed-off-by: Tony Battersby --- No changes since v1. The original code called _base_release_memory_pools() before "goto out" if dma_pool_alloc() failed, but this was unnecessary because mpt3sas_base_attach() will call _base_release_memory_pools() after "goto out_free_resources". It may have been that way because the out-of-tree vendor driver (from https://www.broadcom.com/support/download-search) has a slightly-more-complicated error handler there that adjusts max_request_credit, calls _base_release_memory_pools() and then does "goto retry_allocation" under some circumstances, but that is missing from the in-tree driver. diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 569392d..2cb567a 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -4224,6 +4224,134 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc, } /** + * _base_release_chain_lookup - release chain_lookup memory pools + * @ioc: per adapter object + * + * Free memory allocated from _base_allocate_chain_lookup. + */ +static void +_base_release_chain_lookup(struct MPT3SAS_ADAPTER *ioc) +{ + unsigned int chains_avail = 0; + struct chain_tracker *ct; + int i, j; + + if (!ioc->chain_lookup) + return; + + /* + * NOTE + * + * To make this code easier to understand and maintain, the for loops + * and the management of the chains_avail value are designed to be + * similar to the _base_allocate_chain_lookup() function. That way, + * the code for freeing the memory is similar to the code for + * allocating the memory. + */ + for (i = 0; i < ioc->scsiio_depth; i++) { + if (!ioc->chain_lookup[i].chains_per_smid) + break; + + for (j = ioc->chains_per_prp_buffer; + j < ioc->chains_needed_per_io; j++) { + /* + * If chains_avail is 0, then the chain represents a + * real allocation, so free it. + * + * If chains_avail is nonzero, then the chain was + * initialized at an offset from a previous allocation, + * so don't free it. + */ + if (chains_avail == 0) { + ct = &ioc->chain_lookup[i].chains_per_smid[j]; + if (ct->chain_buffer) + dma_free_coherent( + &ioc->pdev->dev, + ioc->chain_allocation_sz, + ct->chain_buffer, + ct->chain_buffer_dma); + chains_avail = ioc->chains_per_allocation; + } + chains_avail--; + } + kfree(ioc->chain_lookup[i].chains_per_smid); + } + + kfree(ioc->chain_lookup); + ioc->chain_lookup = NULL; +} + +/** + * _base_allocate_chain_lookup - allocate chain_lookup memory pools + * @ioc: per adapter object + * @total_sz: external value that tracks total amount of memory allocated + * + * Return: 0 success, anything else error + */ +static int +_base_allocate_chain_lookup(struct MPT3SAS_ADAPTER *ioc, u32 *total_sz) +{ + unsigned int aligned_chain_segment_sz; + const unsigned int align = 16; + unsigned int chains_avail = 0; + struct chain_tracker *ct; + dma_addr_t dma_addr = 0; + void *vaddr = NULL; + int i, j; + + /* Round up the allocation size for alignment. */ + aligned_chain_segment_sz = ioc->chain_segment_sz; + if (aligned_chain_segment_sz % align != 0) + aligned_chain_segment_sz = + ALIGN(aligned_chain_segment_sz, align); + + /* Allocate a page of chain buffers at a time. */ + ioc->chain_allocation_sz = + max_t(unsigned int, aligned_chain_segment_sz, PAGE_SIZE); + + /* Calculate how many chain buffers we can get from one allocation. */ + ioc->chains_per_allocation = + ioc->chain_allocation_sz / aligned_chain_segment_sz; + + for (i = 0; i < ioc->scsiio_depth; i++) { + for (j = ioc->chains_per_prp_buffer; + j < ioc->chains_needed_per_io; j++) { + /* + * Check if there are any chain buffers left in the + * previously-allocated block. + */ + if (chains_avail == 0) { + /* Allocate a new block of chain buffers. */ + vaddr = dma_alloc_coherent( + &ioc->pdev->dev, + ioc->chain_allocation_sz, + &dma_addr, + GFP_KERNEL); + if (!vaddr) { + pr_err(MPT3SAS_FMT + "chain_lookup: dma_alloc_coherent failed\n", + ioc->name); + return -1; + } + chains_avail = ioc->chains_per_allocation; + } + + ct = &ioc->chain_lookup[i].chains_per_smid[j]; + ct->chain_buffer = vaddr; + ct->chain_buffer_dma = dma_addr; + + /* Go to the next chain buffer in the block. */ + vaddr += aligned_chain_segment_sz; + dma_addr += aligned_chain_segment_sz; + *total_sz += ioc->chain_segment_sz; + chains_avail--; + } + } + + return 0; +} + +/** * _base_release_memory_pools - release memory * @ioc: per adapter object * @@ -4235,8 +4363,6 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc, _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc) { int i = 0; - int j = 0; - struct chain_tracker *ct; struct reply_post_struct *rps; dexitprintk(ioc, pr_info(MPT3SAS_FMT "%s\n", ioc->name, @@ -4326,22 +4452,7 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc, kfree(ioc->hpr_lookup); kfree(ioc->internal_lookup); - if (ioc->chain_lookup) { - for (i = 0; i < ioc->scsiio_depth; i++) { - for (j = ioc->chains_per_prp_buffer; - j < ioc->chains_needed_per_io; j++) { - ct = &ioc->chain_lookup[i].chains_per_smid[j]; - if (ct && ct->chain_buffer) - dma_pool_free(ioc->chain_dma_pool, - ct->chain_buffer, - ct->chain_buffer_dma); - } - kfree(ioc->chain_lookup[i].chains_per_smid); - } - dma_pool_destroy(ioc->chain_dma_pool); - kfree(ioc->chain_lookup); - ioc->chain_lookup = NULL; - } + _base_release_chain_lookup(ioc); } /** @@ -4784,29 +4895,8 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc, total_sz += sz * ioc->scsiio_depth; } - ioc->chain_dma_pool = dma_pool_create("chain pool", &ioc->pdev->dev, - ioc->chain_segment_sz, 16, 0); - if (!ioc->chain_dma_pool) { - pr_err(MPT3SAS_FMT "chain_dma_pool: dma_pool_create failed\n", - ioc->name); + if (_base_allocate_chain_lookup(ioc, &total_sz)) goto out; - } - for (i = 0; i < ioc->scsiio_depth; i++) { - for (j = ioc->chains_per_prp_buffer; - j < ioc->chains_needed_per_io; j++) { - ct = &ioc->chain_lookup[i].chains_per_smid[j]; - ct->chain_buffer = dma_pool_alloc( - ioc->chain_dma_pool, GFP_KERNEL, - &ct->chain_buffer_dma); - if (!ct->chain_buffer) { - pr_err(MPT3SAS_FMT "chain_lookup: " - " pci_pool_alloc failed\n", ioc->name); - _base_release_memory_pools(ioc); - goto out; - } - } - total_sz += ioc->chain_segment_sz; - } dinitprintk(ioc, pr_info(MPT3SAS_FMT "chain pool depth(%d), frame_size(%d), pool_size(%d kB)\n", diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h index f02974c..7ee81d5 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -1298,7 +1298,6 @@ struct MPT3SAS_ADAPTER { /* chain */ struct chain_lookup *chain_lookup; struct list_head free_chain_list; - struct dma_pool *chain_dma_pool; ulong chain_pages; u16 max_sges_in_main_message; u16 max_sges_in_chain_message; @@ -1306,6 +1305,8 @@ struct MPT3SAS_ADAPTER { u32 chain_depth; u16 chain_segment_sz; u16 chains_per_prp_buffer; + u32 chain_allocation_sz; + u32 chains_per_allocation; /* hi-priority queue */ u16 hi_priority_smid;