From patchwork Wed May 19 02:39:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Pan, Xinhui" X-Patchwork-Id: 12266015 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.6 required=3.0 tests=BAYES_00, CHARSET_FARAWAY_HEADER,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C2ADC433ED for ; Wed, 19 May 2021 02:39:39 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DEB2E61364 for ; Wed, 19 May 2021 02:39:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DEB2E61364 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=amd.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 19F3E6E161; Wed, 19 May 2021 02:39:38 +0000 (UTC) Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2066.outbound.protection.outlook.com [40.107.244.66]) by gabe.freedesktop.org (Postfix) with ESMTPS id 08E6B6E161; Wed, 19 May 2021 02:39:36 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dI1EeMl+MFhdpqRPwtVHiALem6N9Y7Dsb6lDSVEUrH/JjvHJNYhkhh6QoRMWppvOKyKZjXDh5A8RLID0bUdE0nhTiNemt2fyn4WUxBQxfHDH99UNwHvhtXb1UcbLnhGZ+oHMlCaghhIVqQxUicXts5fdUtmvDR3tyXTNc2s3Y+CWMzhfr1zDoVVYpCbe8xtHH1dAsKL45jjOf9M8IACRNwERMU1jqnWHtpWLFD8vwm8Qne2q/1a92qbeEY+gxfKeyErPLcdJUCGXwgXqAHPD7wjodBIHOCLT2BPfREOD87ZUr8+qfvr9dr+4C1GRRz5og52B3O1Oivf0jNLkgX8eAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mGyXK9JJKQoteRRJpgWrIJbBd3CCVYnVVIgLpavLY+4=; b=kXW2lyV+AIDa3/WSYRkn87WIl78NKK/1it4QIDKEIA6cqGsx3hErdDO0DlHKXzqII2D/CGA/lIt9PdCShjBbMArKmon0tRQ0/gmdvV724Y/u1LbFWSfzPrJ76SWIc/Uwx3Yf31bxepSDXxuEn8t9UfOSpW+SIGVWWG4mL+nTW1UhlfBs19J1wlFbep58QCDSr9iMPTqJwxatY2vejSAQHiPddQ8PsActY57MFM4thWVnOE0yrz5odqTfvv8ZIIBnm+TtsTWV9YT237j4vh7v22oBgw6Dp56omY9vgGDPVBJJXQK3Ti6RpWyHf/IQ6pDeEIskHkyhYqG5kxfDqtGIQA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mGyXK9JJKQoteRRJpgWrIJbBd3CCVYnVVIgLpavLY+4=; b=1UXESbF/1mzVGmhWKo88MNxmRtcKiNFJhG+B/rN3U/aPfGnsU4dv9VJyM+ajjrIgyJ3Ux5gdO4LbBMJgT8s4gBlZmaTSLefwTE8QIGoMtqsfof/2XIo54eL1FyERFFj6RK3RrlLsHP9RwLe0E1nbkR2MSFmdGJvrmKPmqxPkdy0= Received: from DM4PR12MB5165.namprd12.prod.outlook.com (2603:10b6:5:394::9) by DM4PR12MB5102.namprd12.prod.outlook.com (2603:10b6:5:391::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4129.25; Wed, 19 May 2021 02:39:33 +0000 Received: from DM4PR12MB5165.namprd12.prod.outlook.com ([fe80::4543:6802:6acc:c92d]) by DM4PR12MB5165.namprd12.prod.outlook.com ([fe80::4543:6802:6acc:c92d%5]) with mapi id 15.20.4129.031; Wed, 19 May 2021 02:39:33 +0000 From: "Pan, Xinhui" To: "amd-gfx@lists.freedesktop.org" Subject: =?eucgb2312_cn?b?u9i4tDogW1JGQyBQQVRDSCAxLzJdIGRybS9hbWRncHU6IEZpeCBtZW1v?= =?eucgb2312_cn?b?cnkgY29ycnVwdGlvbiBkdWUgdG8gc3dhcG91dCBhbmQgc3dhcGlu?= Thread-Topic: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin Thread-Index: AQHXTFbLtz1o1HqhjUSdnwL4CAK9UqrqFh5j Date: Wed, 19 May 2021 02:39:33 +0000 Message-ID: References: <20210519022852.16766-1-xinhui.pan@amd.com> In-Reply-To: <20210519022852.16766-1-xinhui.pan@amd.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_88914ebd-7e6c-4e12-a031-a9906be2db14_Enabled=True; MSIP_Label_88914ebd-7e6c-4e12-a031-a9906be2db14_SiteId=3dd8961f-e488-4e60-8e11-a82d994e183d; MSIP_Label_88914ebd-7e6c-4e12-a031-a9906be2db14_SetDate=2021-05-19T02:39:32.190Z; MSIP_Label_88914ebd-7e6c-4e12-a031-a9906be2db14_Name=AMD-Official Use Only; MSIP_Label_88914ebd-7e6c-4e12-a031-a9906be2db14_ContentBits=0; MSIP_Label_88914ebd-7e6c-4e12-a031-a9906be2db14_Method=Standard; authentication-results: lists.freedesktop.org; dkim=none (message not signed) header.d=none; lists.freedesktop.org; dmarc=none action=none header.from=amd.com; x-originating-ip: [180.167.199.185] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 29ab75e5-f696-4143-3047-08d91a6f55ff x-ms-traffictypediagnostic: DM4PR12MB5102: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:1443; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: h1ksYK1QOVyQPYuBKA116L7ctDrkbOTWs9bTZl6p/nvJzBhAusGEotJVTp8hk/cnLCYPVllkVGzQ5fdwNwyXf4+EStwpum/JtgHVupH4CxfdY2ZOKxhy2ajUqLvncF9kkjO1zoJ8UdEq5O2p5sPahUzbr26p5vIUQtgkz2aJC6QJ1Vim1FcDaIOtLEZpHdFSFG6Dp2nsgutMS7MHKZwMcP0nfb45eVLIEuxwjc/bcyWP4usCGoWgH/fN5hpu30sLSbi5socnFeEh2oqJ/6366o+IxAhw0YuAcCX8BIu6ReIwDi+Is3CBeysiv5g5w5I3usFLyJjQyIqw571aIdaN71tQw4JnOiOsdIbA7OcJfM1NaiB2uL2k/XzSG2ep4QBbInLHyoRma5D9myHqcHBysUy5OpCo82mdoHoEGdKBnlbmFnWE9FWgmTHLVD2ugfkkWEqV8t/Cim0dGqN9/OEPTQ3wFRh+et9sSSS7TQ5nIfJgjnb0LGWf/mCyGM7p7whw22l9GIPg8ifU2Mgrre89yl8e3foM/QMfas9/KpsgHd/aCsOh4suqtWYZfnbPbXQEVaLG+hPWJr2HkU1sLa+B+XlShS3xjkRXuRKJ991ADZo= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM4PR12MB5165.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(136003)(39860400002)(346002)(396003)(366004)(66946007)(54906003)(66446008)(64756008)(52536014)(66556008)(66476007)(76116006)(91956017)(478600001)(83380400001)(2906002)(186003)(71200400001)(33656002)(86362001)(55016002)(6916009)(122000001)(224303003)(26005)(8936002)(5660300002)(38100700002)(7696005)(6506007)(316002)(9686003)(4326008); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?eucgb2312_cn?b?M3lPUWpjQklxNEdLM2M4dzV3VXlO?= =?eucgb2312_cn?b?dmhnU3dkQVhEeHh4Y2tDN2pVcndkQUJkdVM5RUF5b2M3ci9xc0VQYjI0TkpWMGEr?= =?eucgb2312_cn?b?OUljdHpWTFlXempnVVNadUdRU2JBeUl1dzZDdFlHOXBMYk85ZWJySEJyTVFpOXVj?= =?eucgb2312_cn?b?SFZkMFVTWFBNSjdCNEZXSGtCeXpoUmVlcFJtWjRjUkxiZUFjOVprU2JiTEpEN3ZJ?= =?eucgb2312_cn?b?ci9TaDcyWFhybWE0dnVUVWVHWXRWdk0zREhFT2RBMkJnWFM3bi91cFNZc3VEYitw?= =?eucgb2312_cn?b?a2dKaWR1b1hpcTEwK293ZHd3dXp4UUsrTzFwdi95cEw4YjNyYTBuMDI5cmIwUDho?= =?eucgb2312_cn?b?QllxbTY0MSthUmFqTThEdWtXRjd1c3dNZFIrVlNxYVl1MDhhSFdVMGdTcFB0dkU5?= =?eucgb2312_cn?b?eFF3TFFqTml3SGlweDFsU21qUW9ZOVpsdXhxZGoyc0pZU0tqV0dzUWlEYmM4Z0Vt?= =?eucgb2312_cn?b?U3UzaXpJalJWU2xTWTIvWEd3NkZwdmFrQUQyNUFoZWdoQ29idEJRQlpVNXB1cG1R?= =?eucgb2312_cn?b?U3lQVzIzMGIwWnArMHB5WG1LV01FeUNQSTl0MWJ5bUVCYXgyOFRURUpxRWJmTWdR?= =?eucgb2312_cn?b?KzYraFlpcUtCck5WdjBuTEszbnhVQlltZ3JYc21IVWhOOEhkdDkxY3NlRkgwZW52?= =?eucgb2312_cn?b?Zm9qTE1qSmJUWnhnZ1VHdll5S0FXT1BxQXM5U3RzdjdmbWd5a2s5VHQ4d2R6WlpW?= =?eucgb2312_cn?b?MDRlakMwSkM2UWVlZEZOaVpid00yZzlrSGxpd2lzYU82S2p0T1pCakNFd0xIZk5T?= =?eucgb2312_cn?b?ZTcwZzRDZDJWUHZlQkYyMzFVRnhFaEEvQzJVbUZmellURFRpNlp2R0ZVVlRwcFFi?= =?eucgb2312_cn?b?cHAxQ3ZNMndpa0Zpa25jOWRkcC9TQ0VMTUtnV2s1VXY3d3FSYVhNc0N5UUVPOXRi?= =?eucgb2312_cn?b?YkRCOTRaWmlqNjZuUEh6M3FMQ055QlpPd1JQUEN5eU95bG8waW1WbDBoT25IK1N0?= =?eucgb2312_cn?b?RHF5NUpsY2RudzBxTFlCZVFmRUF6ei9XRjUzQzNJUEJ4M1dqVEdqN0tkdnlEYjZM?= =?eucgb2312_cn?b?WWVrN2xMNGdBQUxCbWU3dDVYRDdtRE95Mk1XY0NhblhyRk96MVgwTXp0L1drYzM3?= =?eucgb2312_cn?b?THVRZitmOUgwNkN0QXc1dW5zdGx6VXcvRHVyVU5HZ3RabFJDbHZ1M2VOWHhBdEc5?= =?eucgb2312_cn?b?VGVRdjVQUlBzREVibUNVcTM1TjhGR1dxSVN6ZWNhdGhGeFhVV1ZSTnd4SFZQeWQ1?= =?eucgb2312_cn?b?L2pLak1OcnpZaEJXQUNzc2FTVVV4U1BqL3hMUjR3SjhFS3VMTlJ5Q2VGSDVJa1J6?= =?eucgb2312_cn?b?RitsNi9pa0ZSb2xUTFZmSFdVTUtFcTdRMlZIQ1RiTzZpUDByaVlGSFl4enVtR2NQ?= =?eucgb2312_cn?b?SzlLTStqNXVqTmppenpOdDg4V1p3OHkvdkhIaXRRVmxUd1YvZlEyKy8wT29zNHZZ?= =?eucgb2312_cn?b?ZnpCNVhsSjVQTHVHZGxCc2doN3JrcFJxd2VXZERHYk5PdjZmc1UwS2R6K2ZzRVFl?= =?eucgb2312_cn?b?SzNNbjdrNWNoYUpzR09vMEh4SUw1TDNKaGFCT2pudzRLeE5oQU0xOWYvbmNlLzJv?= =?eucgb2312_cn?b?UHU4TUd2a1hvMUpOUktPa1ArbGtIallTYzJoWFVLZGxyTDI5TGs1V1BGaUZaZWJ4?= =?eucgb2312_cn?b?Qmk5anBQVFBrSnZxSEhnQ2toMW8xMThXUnJpR1BvZERuUTlydkc2S1hWeUZNN3do?= =?eucgb2312_cn?b?aTZCclR1dXYySFNxOGVIdU1YYnVsalFTK3BKSUtGMnZoUUREa05RYWhSMnl0dWUv?= =?eucgb2312_cn?b?VUJ4WGhPaktiVC9JeEx6MU02V210cVZ5anI3?= MIME-Version: 1.0 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM4PR12MB5165.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 29ab75e5-f696-4143-3047-08d91a6f55ff X-MS-Exchange-CrossTenant-originalarrivaltime: 19 May 2021 02:39:33.5478 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: ZIekRqi+jWUf4x4bDDeJna/IT7DTMLb3Z1cRgAjIjsGR33OcPn6q72ZnFrPA+pVi X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB5102 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Deucher, Alexander" , "Kuehling, Felix" , "Koenig, Christian" , "dri-devel@lists.freedesktop.org" Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" [AMD Official Use Only] To observe the issue. I made one kfdtest case for debug. It just alloc a userptr memory and detect if memory is corrupted. I can hit this failure in 2 minutes. :( --- 2.25.1 diff --git a/tests/kfdtest/src/KFDMemoryTest.cpp b/tests/kfdtest/src/KFDMemoryTest.cpp index 70c8033..a72f53f 100644 --- a/tests/kfdtest/src/KFDMemoryTest.cpp +++ b/tests/kfdtest/src/KFDMemoryTest.cpp @@ -584,6 +584,32 @@ TEST_F(KFDMemoryTest, ZeroMemorySizeAlloc) { TEST_END } +TEST_F(KFDMemoryTest, swap) { + TEST_START(TESTPROFILE_RUNALL) + + unsigned int size = 128<<20; + unsigned int*tmp = (unsigned int *)mmap(0, + size, + PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, + -1, + 0); + EXPECT_NE(tmp, MAP_FAILED); + + LOG() << "pls run this with KFDMemoryTest.LargestSysBufferTest" << std::endl; + do { + memset(tmp, 0xcc, size); + + HsaMemoryBuffer buf(tmp, size); + sleep(1); + EXPECT_EQ(tmp[0], 0xcccccccc); + } while (true); + + munmap(tmp, size); + + TEST_END +} + // Basic test for hsaKmtAllocMemory TEST_F(KFDMemoryTest, MemoryAlloc) { TEST_START(TESTPROFILE_RUNALL) -- 2.25.1 ________________________________________ 发件人: Pan, Xinhui 发送时间: 2021年5月19日 10:28 收件人: amd-gfx@lists.freedesktop.org 抄送: Kuehling, Felix; Deucher, Alexander; Koenig, Christian; dri-devel@lists.freedesktop.org; daniel@ffwll.ch; Pan, Xinhui 主题: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin cpu 1 cpu 2 kfd alloc BO A(userptr) alloc BO B(GTT) ->init -> validate -> init -> validate -> populate init_user_pages -> swapout BO A //hit ttm pages limit -> get_user_pages (fill up ttm->pages) -> validate -> populate -> swapin BO A // Now hit the BUG We know that get_user_pages may race with swapout on same BO. Threre are some issues I have met. 1) memory corruption. This is because we do a swap before memory is setup. ttm_tt_swapout() just create a swap_storage with its content being 0x0. So when we setup memory after the swapout. The following swapin makes the memory corrupted. 2) panic When swapout happes with get_user_pages, they touch ttm->pages without anylock. It causes memory corruption too. But I hit page fault mostly. Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 928e8d57cd08..42460e4480f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -835,6 +835,7 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr) struct amdkfd_process_info *process_info = mem->process_info; struct amdgpu_bo *bo = mem->bo; struct ttm_operation_ctx ctx = { true, false }; + struct page **pages; int ret = 0; mutex_lock(&process_info->lock); @@ -852,7 +853,13 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr) goto out; } - ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages); + pages = kvmalloc_array(bo->tbo.ttm->num_pages, + sizeof(struct page *), + GFP_KERNEL | __GFP_ZERO); + if (!pages) + goto unregister_out; + + ret = amdgpu_ttm_tt_get_user_pages(bo, pages); if (ret) { pr_err("%s: Failed to get user pages: %d\n", __func__, ret); goto unregister_out; @@ -863,6 +870,12 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr) pr_err("%s: Failed to reserve BO\n", __func__); goto release_out; } + + WARN_ON_ONCE(bo->tbo.ttm->page_flags & TTM_PAGE_FLAG_SWAPPED); + + memcpy(bo->tbo.ttm->pages, + pages, + sizeof(struct page*) * bo->tbo.ttm->num_pages); amdgpu_bo_placement_from_domain(bo, mem->domain); ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); if (ret) @@ -872,6 +885,7 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr) release_out: amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); unregister_out: + kvfree(pages); if (ret) amdgpu_mn_unregister(bo); out: