From patchwork Wed Mar 2 11:26:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 8480211 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id E1E0DC0553 for ; Wed, 2 Mar 2016 11:26:16 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 37BB420260 for ; Wed, 2 Mar 2016 11:26:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6BF452025B for ; Wed, 2 Mar 2016 11:26:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755070AbcCBL0N (ORCPT ); Wed, 2 Mar 2016 06:26:13 -0500 Received: from mail-wm0-f47.google.com ([74.125.82.47]:34594 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755061AbcCBL0M (ORCPT ); Wed, 2 Mar 2016 06:26:12 -0500 Received: by mail-wm0-f47.google.com with SMTP id p65so75332072wmp.1 for ; Wed, 02 Mar 2016 03:26:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc; bh=9xOSfWftDTQ7s0zXMQjc4YuCYm200vdtpuRuwwyXmyc=; b=XBsyhcT7bs8l6jaOwKADXXc3D5k/d1kAe9SI5V6Izmz+WMkvtgx1oJDj11g8yLYj3e rROa2rY1/MMFUbk/tEclXRcoGwbCGixgKskj01/T5HwzScC3/XgZ29i40gCiro0lw3+D lJfYr4dxCpI2EK9mfWgcq9EGCpZvt9kP4M0ptXmFxZEWqvz6KXY2MFrhPMxfNzRHc2rK FEmgy6WkbpROlMsQ6qX1KNANMYBYGu0zf9uCgaTIE0hU0hzcYBXaf700EJJDNHg5c+ot jllJ5+aof06FSNsQE+B1xB0PPkSaPvTtbgg/0CyZR1Cw2VDqd/9y+5csQx9TznOSWOLN Kw1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to:cc; bh=9xOSfWftDTQ7s0zXMQjc4YuCYm200vdtpuRuwwyXmyc=; b=l9/EkJOJtLMT0s6THknpH6X8T6Ot3aNFYeFib6NuHZHM/Jlw3UZnYPWGSNFM22fqqH pnp/m8n3Lu/VDfuFuSuupfY3tEAw5HedZWlxpDH4f5nxrt2Q/CQ+3d+XLKSb/pyYixbP Du/U25vyXlknMIqvK80GcKV5D0Wa+edTyOOTzgL/xIm+9wVy9veFZw6eLAtWg344wYE9 vsHR5NbG/ELoqP1dg5aL1kuWhGNWO2BcXtPizDipe9Ii2n9+OvCE2DN0uqkH59nccq/f PiLQnCIi5XWIFryy176lr1t6EIfsqMqx6pu3lbdrIyy8jBqlqbm21oq0fZ6RGZixAgGW /0RQ== X-Gm-Message-State: AD7BkJKdo8JZtn05KUd67WDPdb0xrkulf9u+ohLXI0IImUwflxha38wLrwEw0t/cxc+1zPkmEN9EFE9f7GW4Ug== MIME-Version: 1.0 X-Received: by 10.28.104.131 with SMTP id d125mr3779979wmc.99.1456917970863; Wed, 02 Mar 2016 03:26:10 -0800 (PST) Received: by 10.194.223.102 with HTTP; Wed, 2 Mar 2016 03:26:10 -0800 (PST) Date: Wed, 2 Mar 2016 12:26:10 +0100 Message-ID: Subject: Issues with exclusive-lock code on testing From: Ilya Dryomov To: Doug Fuller Cc: Ceph Development Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, T_TVD_MIME_EPI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Doug, The cause of that memory corruption is a premature (duplicate, too) call to rbd_obj_request_complete() in the !object-map DELETE case. You've got: rbd_osd_req_callback rbd_osd_delete_callback rbd_osd_discard_callback rbd_obj_request_complete completion> ... rbd_obj_request_put <- !!! rbd_obj_request_complete completion> I also spotted two memory leaks on the NOTIFY_COMPLETE path in __do_event(). The event one is trivial, the page vector one I have a question about. The data item is allocated in alloc_msg() and the actual buffer is then passed into __do_event() and eventually into rbd_send_async_notify(), but not further up the stack. Is anything going to use it? If not, we should remove it entirely. Another thing that caught my eye is your diff adds a bunch of ceph_get_snap_context() calls on header.snapc with no corresponding puts. My understanding is the ones around rbd_image_request_fill() are there to workaround the fact that rbd_queue_workfn() isn't used, but the one in rbd_obj_delete_sync() is immediately followed by ceph_osdc_build_request() which bumps snapc and so is almost certainly a leak. The attached patch fixes the use-after-free and plugs those leaks. With it applied your test loop runs fine for me - no crashes or out of memory problems. Thanks, Ilya diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 92c354256055..c0198b6ca605 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1930,14 +1930,14 @@ static void rbd_osd_delete_callback(struct rbd_obj_request *obj_request) u8 current_state; if (!obj_request->img_request) { - rbd_osd_complete_delete(obj_request); + rbd_osd_discard_callback(obj_request); return; } rbd_dev = obj_request->img_request->rbd_dev; if (!rbd_use_object_map(rbd_dev)) { - rbd_osd_complete_delete(obj_request); + rbd_osd_discard_callback(obj_request); return; } @@ -3632,10 +3632,13 @@ static int rbd_send_async_notify(struct rbd_device *rbd_dev, } completed = ceph_osdc_wait_event(osdc, notify_event); - if (!completed) + if (!completed) { ret = -ETIMEDOUT; - else + } else { ret = notify_event->notify.return_code; + ceph_release_page_vector(notify_event->notify.notify_data, + calc_pages_for(0, notify_event->notify.notify_data_len)); + } cancel_event: ceph_osdc_cancel_event(notify_event); @@ -4828,7 +4831,6 @@ static int rbd_obj_delete_sync(struct rbd_device *rbd_dev, //obj_request->osd_req->r_priv = obj_request; - ceph_get_snap_context(rbd_dev->header.snapc); osd_req_op_init(obj_request->osd_req, 0, CEPH_OSD_OP_DELETE, 0); rbd_osd_req_format_snap_write(obj_request, rbd_dev->header.snapc); diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 8316a304af63..12841c5a09c7 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -2942,6 +2942,7 @@ static void __do_event(struct ceph_osd_client *osdc, u8 opcode, event->osd_req = NULL; } complete_all(&event->notify.complete); + ceph_osdc_put_event(event); } break; default: