From patchwork Sun Aug 14 17:20:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Bottomley X-Patchwork-Id: 9279511 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2816F60839 for ; Sun, 14 Aug 2016 17:21:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0966C289C7 for ; Sun, 14 Aug 2016 17:21:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EC36028A58; Sun, 14 Aug 2016 17:20:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EA05289C7 for ; Sun, 14 Aug 2016 17:20:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751862AbcHNRU4 (ORCPT ); Sun, 14 Aug 2016 13:20:56 -0400 Received: from bedivere.hansenpartnership.com ([66.63.167.143]:43982 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751015AbcHNRUz (ORCPT ); Sun, 14 Aug 2016 13:20:55 -0400 Received: from localhost (localhost [127.0.0.1]) by bedivere.hansenpartnership.com (Postfix) with ESMTP id 5517B8EE18B; Sun, 14 Aug 2016 10:20:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=hansenpartnership.com; s=20151216; t=1471195254; bh=UM04Npb6VuyS3RyhKW2ToSEymEDqNmuAKcF370aNbC0=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=JKtIrZBLlTP4BFWQpJasQmSdkhswy1pAkKFOM7rBjj1LzvAydHPxLMnUOPNg9/vRE M6QynjoCsgk/XVyYkfY7dNkh329XomWsY+jogsfFss0UpFYDNfJuKWJyIa1jx2BOca qrEBC2fGF2DwNxFIFj7fbSW6H+TAcfPxEro6Tgzw= Received: from bedivere.hansenpartnership.com ([127.0.0.1]) by localhost (bedivere.hansenpartnership.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WX0fQNB01Az3; Sun, 14 Aug 2016 10:20:54 -0700 (PDT) Received: from [153.66.254.194] (unknown [50.46.144.141]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by bedivere.hansenpartnership.com (Postfix) with ESMTPSA id 6058A8EE17F; Sun, 14 Aug 2016 10:20:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=hansenpartnership.com; s=20151216; t=1471195253; bh=UM04Npb6VuyS3RyhKW2ToSEymEDqNmuAKcF370aNbC0=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=HfWma6BN31rrHYecRmnfLhiAdjWB1VHWcJjANUGEOqMgPL7JgVvwaO2hAnzRcCzQa vH3itQuUTP73yQPvRSzLTMPoUARYpw8yAW03Md8OqpivRyhr9zsdAN/kFRiOZ9hQ0W lt46WvObN6g25OhYUPwNe1199qhwdYOChykdIq4g= Message-ID: <1471195252.2355.18.camel@HansenPartnership.com> Subject: Re: Time to make dynamically allocated devt the default for scsi disks? From: James Bottomley To: Dan Williams Cc: linux-block@vger.kernel.org, linux-scsi , Jens Axboe , "Martin K. Petersen" , Christoph Hellwig , Tejun Heo , Dave Hansen Date: Sun, 14 Aug 2016 10:20:52 -0700 In-Reply-To: References: <1471047451.2407.95.camel@HansenPartnership.com> <1471101800.2397.9.camel@HansenPartnership.com> <1471110212.2397.39.camel@HansenPartnership.com> X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sat, 2016-08-13 at 11:27 -0700, Dan Williams wrote: > On Sat, Aug 13, 2016 at 10:43 AM, James Bottomley > wrote: > > On Sat, 2016-08-13 at 09:29 -0700, Dan Williams wrote: > > > On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley > > > wrote: > > > > It does? The race is the fact that the parent can be removed > > > > before the child meaning if the parent name is re-registered > > > > before the child dies we get a duplicate name in bdi space. > > > > > > No, the race is that the *name* of the parent isn't released > > > until the child is both unregistered and put. The device core is > > > already ensuring that the parent is not released until all > > > descendants have been removed. > > > > We're both saying the same thing: the problem is that, with > > df08c32ce3be the bdi name lifetime is tied to the lifetime of the > > gendisk. However, the parent of the gendisk currently is only tied > > to the visibility lifetime of the gendisk, not the final put > > lifetime, so it doesn't see this. > > > > > > > > > > > So I tried the attached and it makes the libnvdimm unit tests > > > > > start crashing. > > > > > > > > Well, the attached is clearly buggy, isn't it? You're trying > > > > to do a get on the parent before the parent is actually set. > > > > > > Ah, yes, thank you. Fixed up v2 attached that passes my tests. > > > > > > > Why don't you just try the incremental patch I sent instead of > > > > trying to rework it? > > > > > > I reworked it because it is the bdi that holds this extra > > > dependency on the disk's parent, not the disk itself. > > > > Philosophically I don't like this approach. The dependency goes > > > > bdi->gendisk->parent > > I'm arguing that there is no bdi->gendisk dependency. You created one with your bdi->owner field. Just because you didn't call it a parent doesn't mean it wasn't one. Arguably the whole bdi thing is strangely done because gendisk treats it like a class and that's how it behaves, it just doesn't have a proper class structure (which is why gendisk creates the link that would be done by the class infrastructure) > The dependency is: > > bdi->devt devt isn't a device (in the struct device sense). It exists as effectively an embedded component of the bdi. As far as I can tell there's no reason for it to be separately allocated, it could be properly embedded as is the normal pattern. > It just so happens that block-dynamic devt is released in > disk_release(). > > > Making the bdi manage the parent lifetime is an unusual pattern. > > Making the parent stay around until the last reference to gendisk > > is put is the usual one. > > What's unusual is the bdi's dependency on the allocated name, not the > gendisk itself. A name is just a resource belonging to an object. The object it belongs to is the bdi and the bdi is parented by the owner field (and a hokey link) to the gendisk. > > > > > A couple crash logs attached. Not yet sure what assumption > > > > > is getting violated, but how about that conversion of scsi to > > > > > use dynamic devt? ;-) > > > > > > > > It's completely orthogonal. The problem is in hierarchy > > > > lifetimes: switching from static to dynamic allocation won't > > > > change that at all. You don't see this problem in nvme because > > > > the parent control device's lifetime belongs to the controller > > > > not the disk. In SCSI the parent is our representation of the > > > > SCSI device whose lifetime is governed at the SCSI level and > > > > effectively represents the disk. > > > > > > > > > > No, it's only the name. We could achieve the same by teaching > > > the block core to manage the "sd_index_ida" instead of the sd > > > driver itself, but the v2-patch attached works and does not > > > introduce that layering violation. > > > > Um, so this patch doesn't fix the problem. It merely makes the > > lifetime rules correct so the problem can then be fixed at the scsi > > level. > > You're right that this patch does not fix the problem, I missed that > the scsi_disk is not the parent of the gendisk, so this patch does > nothing to delay scsi_disk_release. What I think is the real fix is > to make the devt properly reference counted and prevent > ida_remove(&sd_index_ida, sdkp->index); from being called until all > objects derived from that allocation are done with it. OK, this is another philosophical difference, I suppose: since bdi is already so complex and non-standard, I really don't think adding more non standard stuff is a good idea. The simplest way to fix it is 1. The two line patch I already sent to make the bdi hold the owner ->parent until release 2. Parent the gendisk to scsi_disk->dev.  The name release is already in the correct place, so this is a 3 line patch. These are established patterns, so they're both understandable to anyone who reads the code. The answer to any other BDI lifetime problem is to free the name in the parent release. James --- -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index d3e852a..222771d 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3000,7 +3000,13 @@ static void sd_probe_async(void *data, async_cookie_t cookie) } blk_pm_runtime_init(sdp->request_queue, dev); - device_add_disk(dev, gd); + /* + * previously the parent of the gendisk was the scsi device. It + * was moved to fix lifetime rules, so now we install a symlink + * to the new location of the block class directory + */ + device_add_disk(&sdkp->dev, gd); + WARN_ON(sysfs_add_link_to_group(&dev->kobj, "block", &sdkp->dev.kobj, "block")); if (sdkp->capacity) sd_dif_config_host(sdkp); @@ -3142,6 +3148,7 @@ static int sd_remove(struct device *dev) async_synchronize_full_domain(&scsi_sd_pm_domain); async_synchronize_full_domain(&scsi_sd_probe_domain); + sysfs_remove_link(&dev->kobj, "block"); device_del(&sdkp->dev); del_gendisk(sdkp->disk); sd_shutdown(dev);