From patchwork Wed May 30 03:34:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 10437585 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id EA19960327 for ; Wed, 30 May 2018 03:34:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8EA228797 for ; Wed, 30 May 2018 03:34:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CD4FF28814; Wed, 30 May 2018 03:34:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A05A128797 for ; Wed, 30 May 2018 03:34:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935889AbeE3Del (ORCPT ); Tue, 29 May 2018 23:34:41 -0400 Received: from mail-qk0-f195.google.com ([209.85.220.195]:34355 "EHLO mail-qk0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935612AbeE3Dei (ORCPT ); Tue, 29 May 2018 23:34:38 -0400 Received: by mail-qk0-f195.google.com with SMTP id q70-v6so4667934qke.1; Tue, 29 May 2018 20:34:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=A0SN6KQHs8syQkG6Fg3IwY7e0jb8FgULzk2D0Lor7P0=; b=G7MxdKFO21nkC2Iu4AVV69zONmHq7bcSFdw4Hot9F1qQaWvEDdwVcbNY+JQOXM3pXP 799SOpKURlr95/XGo39fcW6oAmi+Abwi8XOWELDaq7V2nDRKOkily/I7anyMXQoGoRWn dAbqJg+FHu8eD9HVgzoHjLzZ9vyoRlB+52Fv8Nl+Ddzojt3a11fc0SZpzNY/mJ/oqBmL /ndR48vzmfgDEo7OoZq8RbA5u3b2oUQaFMnTPnA9QuQfs/v/BFMZ42k0grMewHYc2FqZ Lj7qrtl/bbCz72su+0ya5b8reaT5MXykTuGOdd7Q0pQ3sXre+3FA721UMq1IOtOaJuhx m82w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=A0SN6KQHs8syQkG6Fg3IwY7e0jb8FgULzk2D0Lor7P0=; b=tP/Afgaf7QoY37v07A46arGcUCyvOELzWmP8NUzJlIVqwcb2H33RyZT+elA6aO3z+6 V5jVTq21zLV1pZk6v494imEqYVZPDptI4UL9wOD2Bfob5hGhClyisZbZXxWuSIz5KuPV 2Fxj4ceyYsIURuR1f7XHR0MUOdy2vzDMHQ0sccDJXQytU98aeH3pPp9UuqzoCGl466ht 55dOCb9OkxW78CuqnMO5SMmkfU/ZwaY5Id6MjU2qdfyMVT2UmxzD/ykCjltkexnEkW70 VECwKfw3f3ZsnzJkssEDvctiqlehBZzvHuuc2Zjcyd3QvcRn+BrQT1dyigeyAdf/V1fR ko3A== X-Gm-Message-State: APt69E0f1xmjgPpGmMhbiFty/JzpRRBAlwzU/RUakjxjr2MRH8Hl26q0 w9/VegbRaUfBfeaq2mcKsS5I25li X-Google-Smtp-Source: ADUXVKJ9GeRe4vJOikmbGT8ZN55CRUctSGHDZs4FAxYOzrpKck/kxwPL5jiHUnEcNPyyxIqKFt1kEw== X-Received: by 2002:a37:17c5:: with SMTP id 66-v6mr832880qkx.375.1527651277426; Tue, 29 May 2018 20:34:37 -0700 (PDT) Received: from [10.246.221.134] ([50.234.174.228]) by smtp.gmail.com with ESMTPSA id z5-v6sm25859804qtb.88.2018.05.29.20.34.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 May 2018 20:34:35 -0700 (PDT) Subject: Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks To: David Miller , qing.huang@oracle.com Cc: tariqt@mellanox.com, haakon.bugge@oracle.com, yanjun.zhu@oracle.com, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, gi-oh.kim@profitbricks.com References: <20180523232246.20445-1-qing.huang@oracle.com> <20180525.102321.858995452200286788.davem@davemloft.net> From: Eric Dumazet Message-ID: <7a353b65-6b7f-1aee-1c48-e83c8e02f693@gmail.com> Date: Tue, 29 May 2018 23:34:34 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180525.102321.858995452200286788.davem@davemloft.net> Content-Language: en-US Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 05/25/2018 10:23 AM, David Miller wrote: > From: Qing Huang > Date: Wed, 23 May 2018 16:22:46 -0700 > >> When a system is under memory presure (high usage with fragments), >> the original 256KB ICM chunk allocations will likely trigger kernel >> memory management to enter slow path doing memory compact/migration >> ops in order to complete high order memory allocations. >> >> When that happens, user processes calling uverb APIs may get stuck >> for more than 120s easily even though there are a lot of free pages >> in smaller chunks available in the system. >> >> Syslog: >> ... >> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task >> oracle_205573_e:205573 blocked for more than 120 seconds. >> ... >> >> With 4KB ICM chunk size on x86_64 arch, the above issue is fixed. >> >> However in order to support smaller ICM chunk size, we need to fix >> another issue in large size kcalloc allocations. >> >> E.g. >> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk >> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt >> entry). So we need a 16MB allocation for a table->icm pointer array to >> hold 2M pointers which can easily cause kcalloc to fail. >> >> The solution is to use kvzalloc to replace kcalloc which will fall back >> to vmalloc automatically if kmalloc fails. >> >> Signed-off-by: Qing Huang >> Acked-by: Daniel Jurgens >> Reviewed-by: Zhu Yanjun > > Applied, thanks. > I must say this patch causes regressions here. KASAN is not happy. It looks that you guys did not really looked at mlx4_alloc_icm() This function is properly handling high order allocations with fallbacks to order-0 pages under high memory pressure. BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib] Read of size 4 at addr ffff8817df584f68 by task qp_listing_test/92585 CPU: 38 PID: 92585 Comm: qp_listing_test Tainted: G O Call Trace: [] dump_stack+0x4d/0x72 [] print_address_description+0x6f/0x260 [] kasan_report+0x257/0x370 [] __asan_report_load4_noabort+0x19/0x20 [] to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib] [] mlx4_ib_query_qp+0x1213/0x1660 [mlx4_ib] [] qpstat_print_qp+0x13b/0x500 [ib_uverbs] [] qpstat_seq_show+0x4a/0xb0 [ib_uverbs] [] seq_read+0xa9c/0x1230 [] proc_reg_read+0xc1/0x180 [] __vfs_read+0xe8/0x730 [] vfs_read+0xf7/0x300 [] SyS_read+0xd2/0x1b0 [] do_syscall_64+0x186/0x420 [] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7f851a7bb30d RSP: 002b:00007ffd09a758c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007f84ff959440 RCX: 00007f851a7bb30d RDX: 000000000003fc00 RSI: 00007f84ff60a000 RDI: 000000000000000b RBP: 00007ffd09a75900 R08: 00000000ffffffff R09: 0000000000000000 R10: 0000000000000022 R11: 0000000000000293 R12: 0000000000000000 R13: 000000000003ffff R14: 000000000003ffff R15: 00007f84ff60a000 Allocated by task 4488: save_stack+0x46/0xd0 kasan_kmalloc+0xad/0xe0 __kmalloc+0x101/0x5e0 ib_register_device+0xc03/0x1250 [ib_core] mlx4_ib_add+0x27d6/0x4dd0 [mlx4_ib] mlx4_add_device+0xa9/0x340 [mlx4_core] mlx4_register_interface+0x16e/0x390 [mlx4_core] xhci_pci_remove+0x7a/0x180 [xhci_pci] do_one_initcall+0xa0/0x230 do_init_module+0x1b9/0x5a4 load_module+0x63e6/0x94c0 SYSC_init_module+0x1a4/0x1c0 SyS_init_module+0xe/0x10 do_syscall_64+0x186/0x420 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff8817df584f40 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 8 bytes to the right of 32-byte region [ffff8817df584f40, ffff8817df584f60) The buggy address belongs to the page: page:ffffea005f7d6100 count:1 mapcount:0 mapping:ffff8817df584000 index:0xffff8817df584fc1 flags: 0x880000000000100(slab) raw: 0880000000000100 ffff8817df584000 ffff8817df584fc1 000000010000003f raw: ffffea005f3ac0a0 ffffea005c476760 ffff8817fec00900 ffff883ff78d26c0 page dumped because: kasan: bad access detected page->mem_cgroup:ffff883ff78d26c0 Memory state around the buggy address: ffff8817df584e00: 00 03 fc fc fc fc fc fc 00 03 fc fc fc fc fc fc ffff8817df584e80: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc >ffff8817df584f00: fb fb fb fb fc fc fc fc 00 00 00 00 fc fc fc fc ^ ffff8817df584f80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff8817df585000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb I will test : --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c index 685337d58276fc91baeeb64387c52985e1bc6dda..4d2a71381acb739585d662175e86caef72338097 100644 --- a/drivers/net/ethernet/mellanox/mlx4/icm.c +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c @@ -43,12 +43,13 @@ #include "fw.h" /* - * We allocate in page size (default 4KB on many archs) chunks to avoid high - * order memory allocations in fragmented/high usage memory situation. + * We allocate in as big chunks as we can, up to a maximum of 256 KB + * per chunk. Note that the chunks are not necessarily in contiguous + * physical memory. */ enum { - MLX4_ICM_ALLOC_SIZE = PAGE_SIZE, - MLX4_TABLE_CHUNK_SIZE = PAGE_SIZE, + MLX4_ICM_ALLOC_SIZE = 1 << 18, + MLX4_TABLE_CHUNK_SIZE = 1 << 18 }; static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)