From patchwork Thu May 21 19:23:52 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 6458411 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id D23699F399 for ; Thu, 21 May 2015 19:26:16 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id BC81A202AE for ; Thu, 21 May 2015 19:26:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 968482052D for ; Thu, 21 May 2015 19:26:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755483AbbEUT0N (ORCPT ); Thu, 21 May 2015 15:26:13 -0400 Received: from mail-qk0-f179.google.com ([209.85.220.179]:35491 "EHLO mail-qk0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755414AbbEUT0M (ORCPT ); Thu, 21 May 2015 15:26:12 -0400 Received: by qkdn188 with SMTP id n188so59190758qkd.2 for ; Thu, 21 May 2015 12:26:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=OGgYjIhIo+V0JEMfoSvpHJUFW0Beko8CTZIA3gD5OXQ=; b=io/5EgzjWnG2UyaU+kijgL4+TTDCfTZTD1dt4ATd8s2wmsG8XGEvbJX77t/YCq6dGD 8ie1Xghz2ZQowpUhbGvcRvLpgDJ6ggkT/swrVVvIU6OAd8wXuMe1w9zc1uVEzp/EyGss bkVpV8oXrS72Q+8xre6AmfKkgrbBMx6aZ3oYe0olV5455TlkFAZm0/QSBDjRef/9KcQU vNKKQ9hIfLMS1Qmx1vxUX87UQdi8M4CQx9kYCApR4A+uOVShRisQWBAhDmQgdr5p9+ay i/QdsK5kWpyDKQX0oSgwN+A0ReRDOqkZ1i2dWZYX6EKbZa7zPF81qOH7Ng+q6muE/Et2 T2mg== X-Received: by 10.140.88.242 with SMTP id t105mr5923148qgd.92.1432236371542; Thu, 21 May 2015 12:26:11 -0700 (PDT) Received: from localhost.localdomain.com (nat-pool-bos-t.redhat.com. [66.187.233.206]) by mx.google.com with ESMTPSA id 64sm1309851qkw.13.2015.05.21.12.26.08 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 May 2015 12:26:11 -0700 (PDT) From: j.glisse@gmail.com To: akpm@linux-foundation.org Cc: Linus Torvalds , , Mel Gorman , "H. Peter Anvin" , Peter Zijlstra , Andrea Arcangeli , Johannes Weiner , Larry Woodman , Rik van Riel , Dave Airlie , Brendan Conoboy , Joe Donohue , Duncan Poole , Sherry Cheung , Subhash Gutti , John Hubbard , Mark Hairgrove , Lucien Dunning , Cameron Buschardt , Arvind Gopalakrishnan , Haggai Eran , Shachar Raindel , Liran Liss , Roland Dreier , Ben Sander , Greg Stoner , John Bridgman , Michael Mantor , Paul Blinzer , Laurent Morichetti , Alexander Deucher , Oded Gabbay , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Subject: [PATCH 35/36] IB/mlx5/hmm: add page fault support for ODP on HMM. Date: Thu, 21 May 2015 15:23:52 -0400 Message-Id: <1432236233-4035-36-git-send-email-j.glisse@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1432236233-4035-1-git-send-email-j.glisse@gmail.com> References: <1432236233-4035-1-git-send-email-j.glisse@gmail.com> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse This patch add HMM specific support for hardware page faulting of user memory region. Signed-off-by: Jérôme Glisse cc: --- drivers/infiniband/hw/mlx5/odp.c | 147 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 146 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c index bd29155..093f5b8 100644 --- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -56,6 +56,55 @@ static struct mlx5_ib_mr *mlx5_ib_odp_find_mr_lkey(struct mlx5_ib_dev *dev, #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING_HMM +struct mlx5_hmm_pfault { + struct mlx5_ib_mr *mlx5_ib_mr; + u64 start_idx; + dma_addr_t access_mask; + unsigned npages; + struct hmm_event event; +}; + +static int mlx5_hmm_pfault(struct mlx5_ib_dev *mlx5_ib_dev, + struct hmm_mirror *mirror, + const struct hmm_event *event) +{ + struct mlx5_hmm_pfault *pfault; + struct hmm_pt_iter iter; + unsigned long addr, cnt; + int ret; + + pfault = container_of(event, struct mlx5_hmm_pfault, event); + hmm_pt_iter_init(&iter); + + for (addr = event->start, cnt = 0; addr < event->end; + addr += PAGE_SIZE, ++cnt) { + dma_addr_t *ptep; + + /* Get and lock pointer to mirror page table. */ + ptep = hmm_pt_iter_update(&iter, &mirror->pt, addr); + /* This could be BUG_ON() as it can not happen. */ + if (!ptep || !hmm_pte_test_valid_dma(ptep)) { + pr_warn("got empty mirror page table on pagefault.\n"); + return -EINVAL; + } + if ((pfault->access_mask & ODP_WRITE_ALLOWED_BIT)) { + if (!hmm_pte_test_write(ptep)) { + pr_warn("got wrong protection permission on " + "pagefault.\n"); + return -EINVAL; + } + hmm_pte_set_bit(ptep, ODP_WRITE_ALLOWED_SHIFT); + } + hmm_pte_set_bit(ptep, ODP_READ_ALLOWED_SHIFT); + pfault->npages++; + } + ret = mlx5_ib_update_mtt(pfault->mlx5_ib_mr, + pfault->start_idx, + cnt, 0, &iter); + hmm_pt_iter_fini(&iter, &mirror->pt); + return ret; +} + int mlx5_ib_umem_invalidate(struct ib_umem *umem, u64 start, u64 end, void *cookie) { @@ -178,12 +227,19 @@ static int mlx5_hmm_update(struct hmm_mirror *mirror, const struct hmm_event *event) { struct device *device = mirror->device->dev; + struct mlx5_ib_dev *mlx5_ib_dev; + struct ib_device *ib_device; int ret = 0; + ib_device = container_of(mirror->device, struct ib_device, hmm_dev); + mlx5_ib_dev = to_mdev(ib_device); + switch (event->etype) { case HMM_DEVICE_RFAULT: case HMM_DEVICE_WFAULT: - /* FIXME implement. */ + ret = mlx5_hmm_pfault(mlx5_ib_dev, mirror, event); + if (ret) + return ret; break; case HMM_ISDIRTY: hmm_mirror_range_dirty(mirror, event->start, event->end); @@ -228,6 +284,95 @@ void mlx5_dev_fini_odp_hmm(struct ib_device *ib_device) hmm_device_unregister(&ib_device->hmm_dev); } +/* + * Handle a single data segment in a page-fault WQE. + * + * Returns number of pages retrieved on success. The caller will continue to + * the next data segment. + * Can return the following error codes: + * -EAGAIN to designate a temporary error. The caller will abort handling the + * page fault and resolve it. + * -EFAULT when there's an error mapping the requested pages. The caller will + * abort the page fault handling and possibly move the QP to an error state. + * On other errors the QP should also be closed with an error. + */ +static int pagefault_single_data_segment(struct mlx5_ib_qp *qp, + struct mlx5_ib_pfault *pfault, + u32 key, u64 io_virt, size_t bcnt, + u32 *bytes_mapped) +{ + struct mlx5_ib_dev *mlx5_ib_dev = to_mdev(qp->ibqp.pd->device); + struct ib_mirror *ib_mirror; + struct mlx5_hmm_pfault hmm_pfault; + int srcu_key; + int ret = 0; + + srcu_key = srcu_read_lock(&mlx5_ib_dev->mr_srcu); + hmm_pfault.mlx5_ib_mr = mlx5_ib_odp_find_mr_lkey(mlx5_ib_dev, key); + /* + * If we didn't find the MR, it means the MR was closed while we were + * handling the ODP event. In this case we return -EFAULT so that the + * QP will be closed. + */ + if (!hmm_pfault.mlx5_ib_mr || !hmm_pfault.mlx5_ib_mr->ibmr.pd) { + pr_err("Failed to find relevant mr for lkey=0x%06x, probably " + "the MR was destroyed\n", key); + ret = -EFAULT; + goto srcu_unlock; + } + if (!hmm_pfault.mlx5_ib_mr->umem->odp_data) { + pr_debug("skipping non ODP MR (lkey=0x%06x) in page fault " + "handler.\n", key); + if (bytes_mapped) + *bytes_mapped += + (bcnt - pfault->mpfault.bytes_committed); + goto srcu_unlock; + } + if (hmm_pfault.mlx5_ib_mr->ibmr.pd != qp->ibqp.pd) { + pr_err("Page-fault with different PDs for QP and MR.\n"); + ret = -EFAULT; + goto srcu_unlock; + } + + ib_mirror = hmm_pfault.mlx5_ib_mr->umem->odp_data->ib_mirror; + if (ib_mirror->base.hmm == NULL) { + /* Somehow the mirror was kill from under us. */ + ret = -EFAULT; + goto srcu_unlock; + } + + /* + * Avoid branches - this code will perform correctly + * in all iterations (in iteration 2 and above, + * bytes_committed == 0). + */ + io_virt += pfault->mpfault.bytes_committed; + bcnt -= pfault->mpfault.bytes_committed; + + hmm_pfault.npages = 0; + hmm_pfault.start_idx = (io_virt - (hmm_pfault.mlx5_ib_mr->mmr.iova & + PAGE_MASK)) >> PAGE_SHIFT; + hmm_pfault.access_mask = ODP_READ_ALLOWED_BIT; + hmm_pfault.access_mask |= hmm_pfault.mlx5_ib_mr->umem->writable ? + ODP_WRITE_ALLOWED_BIT : 0; + hmm_pfault.event.start = io_virt & PAGE_MASK; + hmm_pfault.event.end = PAGE_ALIGN(io_virt + bcnt); + hmm_pfault.event.etype = hmm_pfault.mlx5_ib_mr->umem->writable ? + HMM_DEVICE_WFAULT : HMM_DEVICE_RFAULT; + ret = hmm_mirror_fault(&ib_mirror->base, &hmm_pfault.event); + + if (!ret && hmm_pfault.npages && bytes_mapped) { + u32 new_mappings = hmm_pfault.npages * PAGE_SIZE - + (io_virt - round_down(io_virt, PAGE_SIZE)); + *bytes_mapped += min_t(u32, new_mappings, bcnt); + } + +srcu_unlock: + srcu_read_unlock(&mlx5_ib_dev->mr_srcu, srcu_key); + pfault->mpfault.bytes_committed = 0; + return ret ? ret : hmm_pfault.npages; +} + #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING_HMM */