From patchwork Tue Aug 9 09:22:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kashyap Desai X-Patchwork-Id: 12939484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AC2CC19F2D for ; Tue, 9 Aug 2022 09:23:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239111AbiHIJXI (ORCPT ); Tue, 9 Aug 2022 05:23:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234639AbiHIJXH (ORCPT ); Tue, 9 Aug 2022 05:23:07 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41372CE0 for ; Tue, 9 Aug 2022 02:23:07 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id f65so10831935pgc.12 for ; Tue, 09 Aug 2022 02:23:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:message-id:date:subject:cc:to:from:from:to:cc; bh=oUtfXhrTbHdZisj4OF7rsnBvMCv+B2X4eAwZsVWHdgI=; b=SOqhsfUYinN9QOerCYHdBdTUwCc8nrpzi7Uh5vXLgrozEWxIyf0gskB7s06WrtcdZ7 vatoxlQ+Ph99AkcUotINoQRMJL9X27vptwOmUiUujyyd2r2CVBIxFVUUvSLJXu/5f6Zx Ii5EYdYc6V4BAjxXgY4eHc0/FA03d4T8j37Ss= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc; bh=oUtfXhrTbHdZisj4OF7rsnBvMCv+B2X4eAwZsVWHdgI=; b=3BPCLbqvMwUV3zRdA/uPJkKjZ+oorNl/miPpfwbdXyKYFFngQXDdBuGjFO5oy1mtu/ uqFbT6eZ1bFcyD3aZyFMqW5lPl4l3l9syqQS/1YIizIC3i1t8RnivKqaqJ1Ray4haT+P JkkgmST/gxpzKrO3Q0xAQZWuhp9R/Y9ogDA63QZPzpzo63zjm/T/Vx3JMlr5TM1Vhvt2 dxxBXl2xTU9dv5wz2SJbfTBc9FQ2POVc878FrcBXbmmVWIUm+g5XTIVKcXXjKLmh4TMw 96+ORf3HXHP5PaVN0O3/sH2392Ek+MXbrkKzUQzGHF6Uh8UjMQcnOT5WUPxEXbhrOlEl Zd9w== X-Gm-Message-State: ACgBeo0Wl1vteh5UK83S2Ae5rwNODV5jtm0Av6/WBBd96pGw4Y4mRfd5 VleDycSgEca8eSJFjlNeIiNzmkMqp9HJqqyvo7GIciLm5kCHUR310Ad153izRRfMJCBr6H8/YGW DhHr5mUaDht17wJWzL4gQutqj4kB+fZ4YNz2tAmadmLJysrax0Gs96dmqVZySzf4hGE/FOeq6oZ F298jR3iyj X-Google-Smtp-Source: AA6agR7g36dpGdefglY921ppndnCEaheeKtpfpZLdg5WyG3JiYHrHT12qx8yvd9sHRMatyIrhm2qtw== X-Received: by 2002:a63:2110:0:b0:41d:234f:16aa with SMTP id h16-20020a632110000000b0041d234f16aamr13347664pgh.481.1660036986410; Tue, 09 Aug 2022 02:23:06 -0700 (PDT) Received: from amd_smc.dhcp.broadcom.net ([192.19.234.250]) by smtp.gmail.com with ESMTPSA id n186-20020a6227c3000000b0052d748498edsm10445740pfn.13.2022.08.09.02.23.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Aug 2022 02:23:05 -0700 (PDT) From: Kashyap Desai To: linux-rdma@vger.kernel.org Cc: jgg@nvidia.com, leonro@nvidia.com, selvin.xavier@broadcom.com, andrew.gospodarek@broadcom.com, Kashyap Desai Subject: [PATCH rdma-rc] RDMA/core: fix sg_to_page mapping for boundary condition Date: Tue, 9 Aug 2022 14:52:13 +0530 Message-Id: <20220809092213.1063297-1-kashyap.desai@broadcom.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This issue frequently hits if AMD IOMMU is enabled. In case of 1MB data transfer, ib core is supposed to set 256 entries of 4K page size in MR page table. Because of the defect in ib_sg_to_pages, it breaks just after setting one entry. Memory region page table entries may find stale entries (or NULL if address is memset). Something like this - crash> x/32a 0xffff9cd9f7f84000 <<< -This looks like stale entries. Only first entry is valid ->>> 0xffff9cd9f7f84000:     0xfffffffffff00000      0x68d31000 0xffff9cd9f7f84010:     0x68d32000      0x68d33000 0xffff9cd9f7f84020:     0x68d34000      0x975c5000 0xffff9cd9f7f84030:     0x975c6000      0x975c7000 0xffff9cd9f7f84040:     0x975c8000      0x975c9000 0xffff9cd9f7f84050:     0x975ca000      0x975cb000 0xffff9cd9f7f84060:     0x975cc000      0x975cd000 0xffff9cd9f7f84070:     0x975ce000      0x975cf000 0xffff9cd9f7f84080:     0x0     0x0 0xffff9cd9f7f84090:     0x0     0x0 0xffff9cd9f7f840a0:     0x0     0x0 0xffff9cd9f7f840b0:     0x0     0x0 0xffff9cd9f7f840c0:     0x0     0x0 0xffff9cd9f7f840d0:     0x0     0x0 0xffff9cd9f7f840e0:     0x0     0x0 0xffff9cd9f7f840f0:     0x0     0x0 All addresses other than 0xfffffffffff00000 are stale entries. Once this kind of incorrect page entries are passed to the RDMA h/w, AMD IOMMU module detects the page fault whenever h/w tries to access addresses which are not actually populated by the ib stack correctly. Below prints are logged whenever this issue hits. bnxt_en 0000:21:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001e address=0x68d31000 flags=0x0050] ib_sg_to_pages function populates the correct page address in most of the cases, but there is one boundary condition which is not handled correctly. Boundary condition explained - Page addresses are not populated correctly if the dma buffer is mapped to the very last region of address space. One of the example - Whenever page_add is  0xfffffffffff00000  (Last 1MB section of the address space) and dma length is 1MB, end of the dma address = 0 (Derived from 0xfffffffffff00000 + 0x100000). use dma buffer length instead of end_dma_addr to fill page addresses. Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Signed-off-by: Kashyap Desai --- drivers/infiniband/core/verbs.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index e54b3f1b730e..36137735cd04 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -2676,15 +2676,19 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, u64 dma_addr = sg_dma_address(sg) + sg_offset; u64 prev_addr = dma_addr; unsigned int dma_len = sg_dma_len(sg) - sg_offset; + unsigned int curr_dma_len = 0; + unsigned int page_off = 0; u64 end_dma_addr = dma_addr + dma_len; u64 page_addr = dma_addr & page_mask; + if (i == 0) + page_off = dma_addr - page_addr; /* * For the second and later elements, check whether either the * end of element i-1 or the start of element i is not aligned * on a page boundary. */ - if (i && (last_page_off != 0 || page_addr != dma_addr)) { + else if (last_page_off != 0 || page_addr != dma_addr) { /* Stop mapping if there is a gap. */ if (last_end_dma_addr != dma_addr) break; @@ -2708,8 +2712,9 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, } prev_addr = page_addr; next_page: + curr_dma_len += mr->page_size - page_off; page_addr += mr->page_size; - } while (page_addr < end_dma_addr); + } while (curr_dma_len < dma_len); mr->length += dma_len; last_end_dma_addr = end_dma_addr;