From patchwork Wed Apr 28 15:11:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12229313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,MSGID_FROM_MTA_HEADER,SPF_HELO_NONE, SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07B10C433B4 for ; Wed, 28 Apr 2021 15:13:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C54006143A for ; Wed, 28 Apr 2021 15:13:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239433AbhD1PNx (ORCPT ); Wed, 28 Apr 2021 11:13:53 -0400 Received: from mail-co1nam11on2062.outbound.protection.outlook.com ([40.107.220.62]:20864 "EHLO NAM11-CO1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S238948AbhD1PNq (ORCPT ); Wed, 28 Apr 2021 11:13:46 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZlUabXx78O+VdqPt94BtQGyyT9HYPpo0/aDZgJr6/x2io51Lj9EkLQDPplWpfJuEzGHJ0G/U4vS+QBy0NW47f78DggqhwXrfOf614x8pktp+MVWHS7t8cj5Dt1Mo5a7IiyPo+GI3BjCksNwkA0lX2PwT5PIu7WWv1nGBsp/mF3GTQIqI/PQ+3l0So/s+dYL4Zta3p1tr8dOnEAlWZy2WHjtVYMBXCPEIYm/3boDP3shdIMHLfN69oI0PQR7CnwClCWOYxb6n3jg8qgMwpEcrfLM3ZSMWpIuFUDh8H4pG+1I9pmcUMc9bINPXlaEbtIiaIZYpAWa9gZo7Z++4KgVbUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nl/O/shlSz/P5DjdJh1C7b4UObg5BRm5//EqkEZWN84=; b=mm2To12TQrYrtHDTzQMgQIJ4asJvvjqFBkbyFYNQR8c7ST68MA6wd7eyvTjcxq73gHdVX/ovgSo3ha1165KPJw8osGFbdA4KHPQJPMe/0R+u1xlD9n0Z/DxPiDawFCJsChHJ1nZYIaPwAXMUDztDGS26QC1hZjx7D8nzX0ky1GawA279NlAKLvb6IQw4lcdFWkzGb3537vIecHWB8K5qHSyKKKlx6Di/ey4Rh8duKZ/7Gw8pzU+42rnsPD6PB4YDcUwScm5EwZR4HYnWy7AZdtIPZvesgRCXrh4g79fCa+U1Ayo9A1EoWcustF9M39/rv1Z84IpURkIhTAY4Sr+A9w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nl/O/shlSz/P5DjdJh1C7b4UObg5BRm5//EqkEZWN84=; b=PfHgEYUHdlvBy6D0pqOhh5WDv2yEwM32zAQ3AcUMq/mTx7bSV+wdGfRq7c1mOjYZ6XX/lDrwAeMoihbNkkhkC5hxLD5Pe67lVyg1e+8scWb4CYYZRwi/Ai+HTRHFlqUlTbC0ILN6gijhzw637uFvmIaNjbWVY14leIR17h5cwV0= Authentication-Results: lists.freedesktop.org; dkim=none (message not signed) header.d=none;lists.freedesktop.org; dmarc=none action=none header.from=amd.com; Received: from SN6PR12MB4623.namprd12.prod.outlook.com (2603:10b6:805:e9::17) by SN6PR12MB2749.namprd12.prod.outlook.com (2603:10b6:805:6d::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4065.22; Wed, 28 Apr 2021 15:13:00 +0000 Received: from SN6PR12MB4623.namprd12.prod.outlook.com ([fe80::ad51:8c49:b171:856c]) by SN6PR12MB4623.namprd12.prod.outlook.com ([fe80::ad51:8c49:b171:856c%7]) with mapi id 15.20.4065.026; Wed, 28 Apr 2021 15:12:59 +0000 From: Andrey Grodzovsky To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-pci@vger.kernel.org, ckoenig.leichtzumerken@gmail.com, daniel.vetter@ffwll.ch, Harry.Wentland@amd.com Cc: ppaalanen@gmail.com, Alexander.Deucher@amd.com, gregkh@linuxfoundation.org, helgaas@kernel.org, Felix.Kuehling@amd.com, Andrey Grodzovsky , =?utf-8?q?Christian_K=C3=B6n?= =?utf-8?q?ig?= Subject: [PATCH v5 15/27] drm/scheduler: Fix hang when sched_entity released Date: Wed, 28 Apr 2021 11:11:55 -0400 Message-Id: <20210428151207.1212258-16-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210428151207.1212258-1-andrey.grodzovsky@amd.com> References: <20210428151207.1212258-1-andrey.grodzovsky@amd.com> X-Originating-IP: [2607:fea8:3edf:49b0:7212:f93a:73b0:8f23] X-ClientProxiedBy: YT1PR01CA0142.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:b01:2f::21) To SN6PR12MB4623.namprd12.prod.outlook.com (2603:10b6:805:e9::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from agrodzovsky-All-Series.hitronhub.home (2607:fea8:3edf:49b0:7212:f93a:73b0:8f23) by YT1PR01CA0142.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:b01:2f::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4065.25 via Frontend Transport; Wed, 28 Apr 2021 15:12:58 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ce484d86-8da4-4163-f65e-08d90a581c30 X-MS-TrafficTypeDiagnostic: SN6PR12MB2749: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:663; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Lulc5oekbOtXj1MaqcGN1BTIc6Cp4R8OS7NZVrhvetkmTF2eGi3vjmJg68sjSit8fOIH3NvALxllkZNEjI2sLLwiqlNoGLfGpzKweMHlFQp5JTVn6udkfb2YkSejPNbl3KQnTMKOgHf1MSJdXLncV+lke3L9o4g7ATTmg7b5RsLU92e5uzTi6Qu41qWWF8Q1/t12+OMeySx8+gCKFK+prSnSeE82w643Dt74J8m73mTt7AD+bFntua6xXB1ljDTR0SytT07eLBzOJ5dhg39K7dKRLOG983KMX5E0iqcbfxzDxY7QqIQK/mlO3lBAvWNHTAldG7FxyOK6KbF4DgTrWKqS/MmoSyY8knNnujNPzdQ8bRfo/8mMAB1XvZqi+tUzuvqXxJglKqQv207MrzcDhxazB32AbJZnhAiPJz1YnOF0DpCX9GVFREjkiW1Nc1lbMh4IUa39NOyxeqk+Qz+LclHDn6SRFwow4PQ93Bcg8bYN35FOoxH6Minb/hi5RYMCbFIPSzBSo76ib+Wh/jrGwXotA7BeVoEn99jNB5MsiJ+Ft5/ijvJJTZoYnic2MpOPVyCz5QydCbjYsCAql4LHWy8F+No8z2buxd4W6k46sA8= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SN6PR12MB4623.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(4636009)(376002)(346002)(39860400002)(136003)(396003)(366004)(16526019)(6666004)(66574015)(6512007)(2616005)(1076003)(2906002)(6636002)(66556008)(186003)(86362001)(5660300002)(8936002)(83380400001)(6486002)(478600001)(44832011)(38100700002)(54906003)(66946007)(52116002)(6506007)(66476007)(8676002)(4326008)(36756003)(316002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData: =?utf-8?q?hHXQAZNqngzXO02+lhSBdeMfGMC3ct?= =?utf-8?q?VK7y/rVxwogi8PvqCi8pFbfVcfkUdaiGZlvZ2uYyqviyhtODcqtSc5AyEpA4G3cMk?= =?utf-8?q?LQuPsixkvvSAQcka5Pjp8aXLCvvyhHSMk6GhJY2EaeMKSzaoO5cZSMc50ETHXxXZV?= =?utf-8?q?gxo3yE2h2jTZz5gLpJwXyZHCHQczH24/amYRcBzByc6acjHp/f23EydWILsjqO/Yq?= =?utf-8?q?N0QncJshmNx+w90lCC1S3Zarq2yPAnBdHHK+b3OO8ez1owLC1C/68jCUZxDnRlaoU?= =?utf-8?q?KMzme0ymxeo6q56Fgba5CUnhO2vD/Chhz8S1rXiX4D9H4iRHCUNGNYsZDIpMvV64N?= =?utf-8?q?ZvUeYVdmTGIVpxBOGLxl020K2rC5dy61IZ5/WSD/rLzT5rvSmFGgqJ9pFU9Rvf+1g?= =?utf-8?q?4491IeDOq7Ifux4Kc/t4FXE/YlYH/IVjD+q8AZyDsw7vVBIKDbi2a8JSAT6WixbEX?= =?utf-8?q?sqxHKD4RrABaBiCom2v7I+0jmz2AFq9Ce0jvu+sMSPDVWVhAij20qTYt94P0/4zAf?= =?utf-8?q?cmePw0M8gx65k3KNmbbGnbMEB9Ct+VfaWJ5+Udu0Rg4h4ao9ZmQCCNqZXBOmHZ1yG?= =?utf-8?q?AT4iV8KCsxjbcKdIZY/udfGZ81JSnqE/A49yYG6aBlykmBD4DHPtzSIFJXRjJz1jA?= =?utf-8?q?8wdV+y6yOwDJA01n5MYdW7SGuaCKERML717uOgTTWkWg3CwptsJ15UTKpccCfTbqO?= =?utf-8?q?VdoPD2inEG8ZTnUYhQwR0JOWdU3NWs25KFfiOyG7YATrNDAAUqWYF3pDdMUckoQ5m?= =?utf-8?q?7Dxc8NyE+mr55Sf/VysBTOBE9vkcWg57zncnuSXh1CxOpzsjYPB8cAD+TSNtL4DWG?= =?utf-8?q?syP7ahWsElZBYAwQwj3kpnr/9Gh5SFzs9K3GT5KHek2qShI4NsIXiFWMEMBxtXMjc?= =?utf-8?q?jRZhywl7iERpBz/0XgUdda/9UFiQfDLipTEk5sVCfl9t2evQOouqx7v7baC6tdBVl?= =?utf-8?q?//oyyJdPAtnrKp2h6K3vhqATExdY0fkwAoeQMiOBaLu0YrM1mjpGbDyZOJxRXYpHf?= =?utf-8?q?RV3icstkw9qLQDaWiNxdMj6ae0CU4Mr0Z/AS1upaf0ipvOfZMpoLQEPJMGnf1+rjL?= =?utf-8?q?1qgugeMjFiNhHQsalK9dxZNxy8Da/GrnrFxkujXf4F6PoDXAGgTjUtJnfJEU1hWFL?= =?utf-8?q?v3Rg0uP6HX+WWJdOEteGqwtF/H7m52Eocvhu30jKMYHVu+MtUzaVlR2ZvfbgVIet6?= =?utf-8?q?jO2t53HIxcsUam2Nw7r+flgxFfXThHUZZz7LGPOEQGy/DWu0iSJ5GhwLlsvCJkiRh?= =?utf-8?q?Z0YP+Ae5/8h8aupoRTKOlUHnuL+qkXkkxnpiijiam5I6ZFumnppXmSey1WaEjDZr4?= =?utf-8?q?gUg4MOFWqRYRr?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: ce484d86-8da4-4163-f65e-08d90a581c30 X-MS-Exchange-CrossTenant-AuthSource: SN6PR12MB4623.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Apr 2021 15:12:59.7125 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: tidZahAWe63VbKeEMpdBSQLgXVe1Gr1otsTGygf0HvRQLXgyrWPJ2pI+vIhfyuA3b+uMlbuqO7bunDgzpq3QKQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR12MB2749 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Problem: If scheduler is already stopped by the time sched_entity is released and entity's job_queue not empty I encountred a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle never becomes false. Fix: In drm_sched_fini detach all sched_entities from the scheduler's run queues. This will satisfy drm_sched_entity_is_idle. Also wakeup all those processes stuck in sched_entity flushing as the scheduler main thread which wakes them up is stopped by now. v2: Reverse order of drm_sched_rq_remove_entity and marking s_entity as stopped to prevent reinserion back to rq due to race. v3: Drop drm_sched_rq_remove_entity, only modify entity->stopped and check for it in drm_sched_entity_is_idle Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 3 ++- drivers/gpu/drm/scheduler/sched_main.c | 24 ++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index f0790e9471d1..cb58f692dad9 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -116,7 +116,8 @@ static bool drm_sched_entity_is_idle(struct drm_sched_entity *entity) rmb(); /* for list_empty to work without lock */ if (list_empty(&entity->list) || - spsc_queue_count(&entity->job_queue) == 0) + spsc_queue_count(&entity->job_queue) == 0 || + entity->stopped) return true; return false; diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 908b0b56032d..ba087354d0a8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -897,9 +897,33 @@ EXPORT_SYMBOL(drm_sched_init); */ void drm_sched_fini(struct drm_gpu_scheduler *sched) { + struct drm_sched_entity *s_entity; + int i; + if (sched->thread) kthread_stop(sched->thread); + for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { + struct drm_sched_rq *rq = &sched->sched_rq[i]; + + if (!rq) + continue; + + spin_lock(&rq->lock); + list_for_each_entry(s_entity, &rq->entities, list) + /* + * Prevents reinsertion and marks job_queue as idle, + * it will removed from rq in drm_sched_entity_fini + * eventually + */ + s_entity->stopped = true; + spin_unlock(&rq->lock); + + } + + /* Wakeup everyone stuck in drm_sched_entity_flush for this scheduler */ + wake_up_all(&sched->job_scheduled); + /* Confirm no work left behind accessing device structures */ cancel_delayed_work_sync(&sched->work_tdr);