From patchwork Mon Mar 20 11:15:50 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shanker Donthineni X-Patchwork-Id: 9633847 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9FAC56020B for ; Mon, 20 Mar 2017 11:16:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9133C27F89 for ; Mon, 20 Mar 2017 11:16:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 82D2927FA0; Mon, 20 Mar 2017 11:16:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D8E6927F89 for ; Mon, 20 Mar 2017 11:16:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:Reply-To:List-Subscribe:List-Help: List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date :Message-ID:From:To:References:Subject:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=iG/WGMZOs1zJc20MhamvE+29dJMM5ml2hS2nNw5MvN0=; b=oaR/X4cHHhYoxJ Eqjvu8J3SHkHBvXqBX49qEAQNa5xlwNjrIOU3IhACSWwAVzbLmHCmMqceEhABgcyMJmvHrZ2QT+DW pbM0ygAXLiv4FVQ82xiXzcZ/3rzGGG2TBx//k4/9W65oSjnLnb26rEUl9FkqBsLoFQOm321Iz/Hck U+7pM16moT3Mi726zz0NAUo1lUopUBWVYOPLmk9TgKHR4iDqIYu32N0VUxzadTPFHwPLmxapI7T3L OU6SPt1tC/+IlyTQZLCTqq+m2hVfgOcM5JXN6TKdSaTDq22Um4DRDPuev/VE/pESUTdV8tC6Pfod6 WAy9jXRUG4o7IA/ivVFg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1cpvIT-0001N6-4g; Mon, 20 Mar 2017 11:16:21 +0000 Received: from smtp.codeaurora.org ([198.145.29.96]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1cpvIO-0001Lk-H8 for linux-arm-kernel@lists.infradead.org; Mon, 20 Mar 2017 11:16:18 +0000 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 6C30860D0C; Mon, 20 Mar 2017 11:15:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1490008555; bh=X+gS11mjGwygNAa3/W8K0gGjVrgH7n+u4Ddw96wSCFw=; h=Reply-To:Subject:References:To:Cc:From:Date:In-Reply-To:From; b=jVPnO29N/3ht2+QJh8IP4uH7VQpXhEcJ4GPTWH8TLsoQDkcw5gjK3NqIdmDTT+IUV ip4p1HdC7L7xT5YGzARyEAkGZU+q60X0EKRK2aej71i6TRlI5qZoSKL2bdFHTownNp kWJowr5TlPLGFMa9f8Ao8OesLup8DYAVkp6tx6Gg= Received: from [10.222.141.31] (i-global254.qualcomm.com [199.106.103.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: shankerd@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 8A9246076A; Mon, 20 Mar 2017 11:15:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1490008553; bh=X+gS11mjGwygNAa3/W8K0gGjVrgH7n+u4Ddw96wSCFw=; h=Reply-To:Subject:References:To:Cc:From:Date:In-Reply-To:From; b=eOKNMGfirsV3oAxi0bANS7Qd+Q5h0mAeZ6LnLjwPIdLIemHKHMtmpe36wt0QPn5Iy uL35mwF4FU7buYPhVc9RuDtlBdxChsURvWPpWm5Lsi+P/E46M/2gC/TIYDICHv6lzj CiOB/IlTiOiW2hCvpyHP9W1gTLHqb+7v+1yk6jig= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 8A9246076A Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=shankerd@codeaurora.org Subject: Re: [PATCH v3] irqchip/gicv3-its: Avoid memory over allocation for ITEs References: <1488896720-6223-1-git-send-email-shankerd@codeaurora.org> <24159fb4-0438-9f9e-9e28-6ac573f6b6f6@arm.com> To: Marc Zyngier From: Shanker Donthineni Message-ID: <51ec631a-da0e-dfb3-873f-2c9cd2e3e085@codeaurora.org> Date: Mon, 20 Mar 2017 06:15:50 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170320_041616_630153_E221AA0E X-CRM114-Status: GOOD ( 28.32 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: shankerd@codeaurora.org Cc: linux-arm-kernel , Thomas Gleixner , Jason Cooper , Vikram Sethi , linux-kernel Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Marc, On 03/20/2017 05:14 AM, Shanker Donthineni wrote: > Hi Marc, > > > On 03/17/2017 10:33 AM, Marc Zyngier wrote: >> On 17/03/17 14:18, Shanker Donthineni wrote: >>> Hi Marc, >>> >>> >>> On 03/17/2017 08:50 AM, Marc Zyngier wrote: >>>> On 07/03/17 14:25, Shanker Donthineni wrote: >>>>> We are always allocating extra 255Bytes of memory to handle ITE >>>>> physical address alignment requirement. The kmalloc() satisfies >>>>> the ITE alignment since the ITS driver is requesting a minimum >>>>> size of ITS_ITT_ALIGN bytes. >>>>> >>>>> Let's try to allocate the exact amount of memory that is required >>>>> for ITEs to avoid wastage. >>>>> >>>>> Signed-off-by: Shanker Donthineni >>>>> ---Hi >>>>> v2: removed 'Change-Id: Ia8084189833f2081ff13c392deb5070c46a64038' from commit. >>>>> v3: changed from IITE to ITE. >>>>> >>>>> drivers/irqchip/irq-gic-v3-its.c | 7 ++++++- >>>>> 1 file changed, 6 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c >>>>> index 86bd428..5aeca78 100644 >>>>> --- a/drivers/irqchip/irq-gic-v3-its.c >>>>> +++ b/drivers/irqchip/irq-gic-v3-its.c >>>>> @@ -1329,8 +1329,13 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, >>>>> */ >>>>> nr_ites = max(2UL, roundup_pow_of_two(nvecs)); >>>>> sz = nr_ites * its->ite_size; >>>>> - sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1; >>>>> + sz = max(sz, ITS_ITT_ALIGN); >>>>> itt = kzalloc(sz, GFP_KERNEL); >>>>> + if (itt && !IS_ALIGNED(virt_to_phys(itt), ITS_ITT_ALIGN)) { >>>>> + kfree(itt); >>>>> + itt = kzalloc(sz + ITS_ITT_ALIGN - 1, GFP_KERNEL); >>>>> + } >>>>> + >>>> Is this really worth the complexity? Are you aware of a system where the >>>> accumulation of overallocation actually shows up as being an issue? >>> As such there is no issue with over allocation. Actually this change masked QDF2400 bug 'iirqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065' till now, found and fixed recently while looking at the code for possible memory optimizations. >>> >>>> If you want to be absolutely exact in your allocation, then I'd suggest >>>> doing it all the time, and have a proper dedicated allocator that always >>>> do the right thing, without a wasteful fallback like you still have here. >>> We don't need to fallbak, and it can be removed safely. Looking for >>> your suggestion. should I implement a dedicated allocator or remove >>> fallbak for simpler code? >> Are you saying that kmalloc is guaranteed to give us something that is >> 256 byte aligned? If so, why do we test for alignment (with free + >> over-allocate if it fails)? > I've verified on my system kmalloc() is always allocating memory with 256bytes alignment. kmalloc() uses the generic slab caches available in the kernel to allocate memory based on the input size. > >> I'd rather have only one way of allocating the ITT. Either we always >> overallocate in order to guarantee right alignment (and my personal view >> is that for most system, this doesn't matter at all), or we create our >> own allocator. The issue with the latter is that we don't really have a >> good story for allocating arrays of objects with a given alignment >> (kmem_cache_* only deals with single objects). > Adding a dedicated function to allocate memory is preferable but need pull a few of lines of code. > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > index a27a074..f0125e5 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -90,6 +90,8 @@ struct its_node { > u32 ite_size; > u32 device_ids; > int numa_node; > + struct page *ite_page; > + u32 ite_psz; > }; > > #define ITS_ITT_ALIGN SZ_256 > @@ -266,7 +268,6 @@ static struct its_collection *its_build_mapd_cmd(struct its_cmd_block *cmd, > u8 size = ilog2(desc->its_mapd_cmd.dev->nr_ites); > > itt_addr = virt_to_phys(desc->its_mapd_cmd.dev->itt); > - itt_addr = ALIGN(itt_addr, ITS_ITT_ALIGN); > > its_encode_cmd(cmd, GITS_CMD_MAPD); > its_encode_devid(cmd, desc->its_mapd_cmd.dev->device_id); > @@ -1319,6 +1320,42 @@ static bool its_alloc_device_table(struct its_node *its, u32 dev_id) > return true; > } > > +static void *its_alloc_memory_ites(struct its_node *its, int nr_ites) > +{ > + unsigned long flags; > + struct page *page; > + void *ite; > + u32 size; > + > + size = ALIGN(nr_ites * its->ite_size, ITS_ITT_ALIGN); > + raw_spin_lock_irqsave(&its->lock, flags); > + > + /* Try to reuse the current page if enough space is available */ > + if (size > its->ite_psz) { > + /* Allocate a new compound page with minimum order 1 */ > + page = alloc_pages(GFP_KERNEL | __GFP_COMP | __GFP_ZERO, > + max(get_order(size), 1)); > + if (!page) { > + raw_spin_unlock_irqrestore(&its->lock, flags); > + return NULL; > + } > + > + /* Free current page, decrement page count */ > + if (its->ite_page) > + put_page(its->ite_page); > + its->ite_psz = PAGE_ORDER_TO_SIZE(compound_order(page)); > + its->ite_page = page; > + } > + > + get_page(its->ite_page); /* increment page count */ > + its->ite_psz -= size; /* update free space */ > + ite = page_address(its->ite_page) + its->ite_psz; > + raw_spin_unlock_irqrestore(&its->lock, flags); > + gic_flush_dcache_to_poc(ite, size); > + > + return ite; > +} > + > static struct its_device *its_create_device(struct its_node *its, u32 dev_id, > int nvecs) > { > @@ -1330,7 +1367,6 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, > int lpi_base; > int nr_lpis; > int nr_ites; > - int sz; > > if (!its_alloc_device_table(its, dev_id)) > return NULL; > @@ -1342,22 +1378,22 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, > * express an ITT with a single entry. > */ > nr_ites = max(2UL, roundup_pow_of_two(nvecs)); > - sz = nr_ites * its->ite_size; > - sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1; > - itt = kzalloc(sz, GFP_KERNEL); > + itt = its_alloc_memory_ites(its, nr_ites); > + if (!itt) > + return NULL; > + > lpi_map = its_lpi_alloc_chunks(nvecs, &lpi_base, &nr_lpis); > if (lpi_map) > col_map = kzalloc(sizeof(*col_map) * nr_lpis, GFP_KERNEL); > > - if (!dev || !itt || !lpi_map || !col_map) { > + if (!dev || !lpi_map || !col_map) { > kfree(dev); > - kfree(itt); > + put_page(virt_to_page(itt)); > kfree(lpi_map); > kfree(col_map); > return NULL; > } > > - gic_flush_dcache_to_poc(itt, sz); > > dev->its = its; > dev->itt = itt; > @@ -1386,7 +1422,7 @@ static void its_free_device(struct its_device *its_dev) > raw_spin_lock_irqsave(&its_dev->its->lock, flags); > list_del(&its_dev->entry); > raw_spin_unlock_irqrestore(&its_dev->its->lock, flags); > - kfree(its_dev->itt); > + put_page(virt_to_page(its_dev->itt)); > kfree(its_dev); > } > > > This patch is not urgent, if you want we can revisit it at later time. diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 86bd428..5aeca78 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1329,8 +1329,13 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, */ nr_ites = max(2UL, roundup_pow_of_two(nvecs)); sz = nr_ites * its->ite_size; - sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1; + sz = max(sz, ITS_ITT_ALIGN); itt = kzalloc(sz, GFP_KERNEL); + if (itt && !IS_ALIGNED(virt_to_phys(itt), ITS_ITT_ALIGN)) { + kfree(itt); + itt = kzalloc(sz + ITS_ITT_ALIGN - 1, GFP_KERNEL); + } + lpi_map = its_lpi_alloc_chunks(nvecs, &lpi_base, &nr_lpis); if (lpi_map) col_map = kzalloc(sizeof(*col_map) * nr_lpis, GFP_KERNEL);