From patchwork Thu Mar 22 20:59:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Parav Pandit X-Patchwork-Id: 10302317 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C02A8605F3 for ; Thu, 22 Mar 2018 20:59:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFCB9286FD for ; Thu, 22 Mar 2018 20:59:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A2A67289B7; Thu, 22 Mar 2018 20:59:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5AC59286FD for ; Thu, 22 Mar 2018 20:59:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751666AbeCVU72 (ORCPT ); Thu, 22 Mar 2018 16:59:28 -0400 Received: from mail-ve1eur01on0080.outbound.protection.outlook.com ([104.47.1.80]:12752 "EHLO EUR01-VE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751586AbeCVU71 (ORCPT ); Thu, 22 Mar 2018 16:59:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=X180ip697AI0mcVlOLX9cd0fktQH6HC2gJekx00QVWE=; b=CQRFw8wrGT9DeVF6V2zuOgnaQD920ZY6QEG7m736dlMGbwBgWvwuHZUnvliC7Cps6p8Ts3qA5sCHur1gkBXSH/A+cNy9JD7oz3p3MAcEap1BsLwzxwJCCR/OdC5A32pyXGEFMscJxNyvPm7+Em8gKYW5/8XMbTGmMEAW+nW2/wM= Received: from VI1PR0502MB3008.eurprd05.prod.outlook.com (10.175.21.22) by VI1PR0502MB3631.eurprd05.prod.outlook.com (52.134.7.30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.588.14; Thu, 22 Mar 2018 20:59:24 +0000 Received: from VI1PR0502MB3008.eurprd05.prod.outlook.com ([fe80::7c71:f070:5dc7:8a48]) by VI1PR0502MB3008.eurprd05.prod.outlook.com ([fe80::7c71:f070:5dc7:8a48%14]) with mapi id 15.20.0588.017; Thu, 22 Mar 2018 20:59:24 +0000 From: Parav Pandit To: Jason Gunthorpe , "linux-rdma@vger.kernel.org" , Leon Romanovsky , "Mark Bloch" CC: Dmitry Vyukov , syzbot , Daniel Jurgens , "dledford@redhat.com" , "Johannes Berg" , "syzkaller-bugs@googlegroups.com" Subject: RE: [PATCH rdma-rc] RDMA/rdma_cm: Fix use after free race with process_one_req Thread-Topic: [PATCH rdma-rc] RDMA/rdma_cm: Fix use after free race with process_one_req Thread-Index: AQHTwhj+nY20GMvSx0WGA2l/sVz9u6Pcr3Pw Date: Thu, 22 Mar 2018 20:59:24 +0000 Message-ID: References: <20180322200423.GA30368@ziepe.ca> In-Reply-To: <20180322200423.GA30368@ziepe.ca> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=parav@mellanox.com; x-originating-ip: [2605:6000:ec80:6500:17:4116:41e7:d035] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; VI1PR0502MB3631; 6:YRjy/Tdn7naaKDzSAwWMBntdFBeCPSgtYxb1X5MwRUTGZKFzq47lx6UYuI4uLENZFzPUWawR1D2rBG8fvoypnT+SfoJ95x4+eT5YaCeIhi7MMMJ82IeRCq+146c9grAA8EF5ndaEsxsXGBF9ccBINdWRWtmZv+22Tqz9moFJlfGjTLRLQjf5ZcQjcuAppFOX2ki5wOrvE/e3xcfsFtRwp5c3uSg8FNJKjFuPbj+awrmzts4m/yN8eFxRSZu4UwHL9u85FM0+VN8IC8H/MYE5YCmp6zMk1VIkojyJ6cEB7B4MHugbsFdphmh+GSM2uuMmDKEeVDBro+Oap45EX55+01Tr+knGodYqkBGuTWVg45j3VMhb0CYblrOWVzsd4Ktf; 5:z/WTTZSDsiZ4WVZADOtj+k3VKcCV9Rk3IQCHES/Idlp28aM9R3ONnVhaxGB71Zx9ZoemPRh0XbmFljG9bGwgmI9zIeUoG+BCg0tsqz4ZDNyeEKW/kbI3o4kS+Jc5CpBAe7I5VxEkLKdPPhoTObEnrGDLNqBxM90Ko5711KxZuAc=; 24:5dRJ5q+fuUzlKtbIvTZRQUCwH/p2FXprGcVUhcfMPHF4Z8QTtksDLTXQKhSCkwv4B6O1GUg3y49/zmdCvN82Ikkvk9/QbknZpooqWaauW90=; 7:dPIbvPnTYMzl3OnzUuJyhodeUnq7t8uD11+cREBlZrYLKx+pCUkI+hNKS3wFPhTwg627jDFdhvuI/9yNdLP3MxBUDcZo7myuetCl65A3UJKhDbL8sJwYqmAHzLlqbyRgq/LJTS/5GVWEIU1nuw4XSbiwbhFQHlpLLkSHV7BwO6oudbi1ie0lOR6CEId4oXEZ256sDrUX1jC9nS9oT3Gs3O8gIDmJ7u/EaqToA4PaWMWqTTIdF73cC7sBPu+nbGNI x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: b849e431-0abb-4c9b-5192-08d59037cb0b x-microsoft-antispam: UriScan:(215639381216008); BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(2017052603328)(7153060)(7193020); SRVR:VI1PR0502MB3631; x-ms-traffictypediagnostic: VI1PR0502MB3631: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(215639381216008)(9452136761055)(211936372134217)(153496737603132)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3002001)(10201501046)(3231221)(944501327)(52105095)(93006095)(93001095)(6055026)(6041310)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123558120)(20161123564045)(6072148)(201708071742011); SRVR:VI1PR0502MB3631; BCL:0; PCL:0; RULEID:; SRVR:VI1PR0502MB3631; x-forefront-prvs: 0619D53754 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(346002)(366004)(39860400002)(376002)(39380400002)(189003)(199004)(51234002)(13464003)(53936002)(68736007)(25786009)(2900100001)(478600001)(46003)(6246003)(5660300001)(2906002)(305945005)(9686003)(55016002)(7736002)(11346002)(6436002)(7696005)(6636002)(106356001)(97736004)(74316002)(2501003)(229853002)(86362001)(4326008)(102836004)(316002)(5250100002)(3660700001)(575784001)(99286004)(8936002)(76176011)(14454004)(53546011)(59450400001)(54906003)(105586002)(110136005)(33656002)(8676002)(81156014)(6506007)(3280700002)(81166006)(446003)(6116002); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR0502MB3631; H:VI1PR0502MB3008.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: YTAQE9VtHrfB3ObEIEtoTqo9HvWmK4XXhYjQWT8VjSi3ri8Pr/AOI05Q8eq3k+wq7jVvjsG3jJIzBLyIaZrMpmt/9V+mFt/ejLi08yW9j7utCSgSOmbmdByRQAIZCJCdbmYU1Fe/3ODxWw9DvQmtAv3z54o61M4Dp5BkVf1wyJsPf/Sm/EAlzFUp1oASZ0O3uXzoZlnq+dICHW0KvyMUMPE9GUKKludIpDJv0hggrIYy66p906t5eZ23VBh2B0EnVPtojLPRH+8krGze978hP84PkAqb5oJ84EVQj61LvXD3nZkWAiCHwuJWTMOjhhFoJZFcx9GyMkEUwm8yyPBnfA== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: b849e431-0abb-4c9b-5192-08d59037cb0b X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Mar 2018 20:59:24.5660 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0502MB3631 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Jason, > -----Original Message----- > From: Jason Gunthorpe > Sent: Thursday, March 22, 2018 3:04 PM > To: linux-rdma@vger.kernel.org; Leon Romanovsky ; > Parav Pandit ; Mark Bloch > Cc: Dmitry Vyukov ; syzbot > ; Daniel Jurgens > ; dledford@redhat.com; Johannes Berg > ; syzkaller-bugs@googlegroups.com > Subject: [PATCH rdma-rc] RDMA/rdma_cm: Fix use after free race with > process_one_req > > process_one_req() can race with rdma_addr_cancel(): > > CPU0 CPU1 > ==== ==== > process_one_work() > debug_work_deactivate(work); > process_one_req() > rdma_addr_cancel() > mutex_lock(&lock); > set_timeout(&req->work,..); > __queue_work() > debug_work_activate(work); > mutex_unlock(&lock); > > mutex_lock(&lock); > [..] > list_del(&req->list); > mutex_unlock(&lock); > [..] > > // ODEBUG explodes since the work is still queued. > kfree(req); > > Causing ODEBUG to detect the use after free: > > ODEBUG: free active (active state 0) object type: work_struct hint: > process_one_req+0x0/0x6c0 include/net/dst.h:165 > WARNING: CPU: 0 PID: 79 at lib/debugobjects.c:291 > debug_print_object+0x166/0x220 lib/debugobjects.c:288 > kvm: emulating exchange as write > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 0 PID: 79 Comm: kworker/u4:3 Not tainted 4.16.0-rc6+ #361 Hardware > name: Google Google Compute Engine/Google Compute Engine, BIOS Google > 01/01/2011 > Workqueue: ib_addr process_one_req > Call Trace: > __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d > lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 > __warn+0x1dc/0x200 kernel/panic.c:547 > report_bug+0x1f4/0x2b0 lib/bug.c:186 > fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug > arch/x86/kernel/traps.c:247 [inline] > do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 > invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986 > RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288 > RSP: 0000:ffff8801d966f210 EFLAGS: 00010086 > RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815acd6e > RDX: 0000000000000000 RSI: 1ffff1003b2cddf2 RDI: 0000000000000000 > RBP: ffff8801d966f250 R08: 0000000000000000 R09: 1ffff1003b2cddc8 > R10: ffffed003b2cde71 R11: ffffffff86f39a98 R12: 0000000000000001 > R13: ffffffff86f15540 R14: ffffffff86408700 R15: ffffffff8147c0a0 > __debug_check_no_obj_freed lib/debugobjects.c:745 [inline] > debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774 > kfree+0xc7/0x260 mm/slab.c:3799 > process_one_req+0x2e7/0x6c0 drivers/infiniband/core/addr.c:592 > process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113 > worker_thread+0x223/0x1990 kernel/workqueue.c:2247 > kthread+0x33c/0x400 kernel/kthread.c:238 > ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406 > > Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to MAC") > Reported-by: syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com > Signed-off-by: Jason Gunthorpe > --- > drivers/infiniband/core/addr.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > Leon, I took a look at this last bug you noted so we can get cleaned up for the > next kernel release. > > I didn't repo it, but I did confirm the C repo is calling rdma_addr_cancel, so I > think this is very likely to be the bug.. > > Parav/Mark: Does this make sense? > Thanks for looking into it. Though I didn't ack on the ML, I had debugged it similar report few days back, fixed and verified it differently by rdma_addr_cancel() doing cancel_delayed_work_sync() and completing the request after that. And if already cancelled than process_one_req to skip processing it. I wasn't sure if work item itself can cancel itself reliably while it is running. Do you know? cancel_delayed_work_sync() appears more robust to me following similar other users. This also avoid canceling it on every execution in process_one_req(). Mark has also reviewed it and it is already in Leon's queue too. Assuming cancel_delayed_work can cancel itself based on kdoc comment that it can be called from any context, patch you posted should work too. Minor modified hunk to reuse code in process_one_req() and process_req() on top of your patch below. --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index b0a52c9..0679f4f 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -561,6 +561,23 @@ static int addr_resolve(struct sockaddr *src_in, return ret; } +static void complete_addr_req(struct addr_req *req) +{ + /* + * Although the work will normally have been canceled by the + * workqueue, it can still be requeued as long as it is on the + * req_list by rdma_addr_cancel(), so it could have been requeued + * before we grabbed &lock. + * We need to cancel it after it is removed from req_list to really be + * sure it is safe to free. + */ + cancel_delayed_work(&req->work); + req->callback(req->status, (struct sockaddr *) &req->src_addr, + req->addr, req->context); + put_client(req->client); + kfree(req); +} + static void process_one_req(struct work_struct *_work) { struct addr_req *req; @@ -586,10 +603,7 @@ static void process_one_req(struct work_struct *_work) list_del(&req->list); mutex_unlock(&lock); - req->callback(req->status, (struct sockaddr *)&req->src_addr, - req->addr, req->context); - put_client(req->client); - kfree(req); + complete_addr_req(req); } static void process_req(struct work_struct *work) @@ -621,15 +635,7 @@ static void process_req(struct work_struct *work) list_for_each_entry_safe(req, temp_req, &done_list, list) { list_del(&req->list); - /* It is safe to cancel other work items from this work item - * because at a time there can be only one work item running - * with this single threaded work queue. - */ - cancel_delayed_work(&req->work); - req->callback(req->status, (struct sockaddr *) &req->src_addr, - req->addr, req->context); - put_client(req->client); - kfree(req); + complete_addr_req(req); } }