From patchwork Tue Aug 20 15:16:02 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 2847159 Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id C8357BF546 for ; Tue, 20 Aug 2013 15:27:29 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D9AA1204D5 for ; Tue, 20 Aug 2013 15:27:24 +0000 (UTC) Received: from mx3-phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) by mail.kernel.org (Postfix) with ESMTP id E621F204C4 for ; Tue, 20 Aug 2013 15:27:19 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx3-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id r7KFGAxE015621; Tue, 20 Aug 2013 11:16:12 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id r7KFG988016948 for ; Tue, 20 Aug 2013 11:16:09 -0400 Received: from mx1.redhat.com (ext-mx14.extmail.prod.ext.phx2.redhat.com [10.5.110.19]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r7KFG8oT007213; Tue, 20 Aug 2013 11:16:08 -0400 Received: from hrndva-omtalb.mail.rr.com (hrndva-omtalb.mail.rr.com [71.74.56.122]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r7KFG4LW028942; Tue, 20 Aug 2013 11:16:04 -0400 X-Authority-Analysis: v=2.0 cv=e9yEuNV/ c=1 sm=0 a=Sro2XwOs0tJUSHxCKfOySw==:17 a=Drc5e87SC40A:10 a=mVp6cRoBB8kA:10 a=5SG0PmZfjMsA:10 a=kj9zAlcOel0A:10 a=meVymXHHAAAA:8 a=KGjhK52YXX0A:10 a=jyE7P5GObSEA:10 a=-n4FZ2bsAJO2NpbtDhMA:9 a=CjuIK1q_8ugA:10 a=jeBq3FmKZ4MA:10 a=Sro2XwOs0tJUSHxCKfOySw==:117 X-Cloudmark-Score: 0 X-Authenticated-User: X-Originating-IP: 67.255.60.225 Received: from [67.255.60.225] ([67.255.60.225:53666] helo=gandalf.local.home) by hrndva-oedge01.mail.rr.com (envelope-from ) (ecelerity 2.2.3.46 r()) with ESMTP id 57/48-18705-33883125; Tue, 20 Aug 2013 15:16:03 +0000 Date: Tue, 20 Aug 2013 11:16:02 -0400 From: Steven Rostedt To: Kent Overstreet , LKML , linux-bcache@vger.kernel.org, dm-devel@redhat.com Message-ID: <20130820111602.3cea203c@gandalf.local.home> Mime-Version: 1.0 X-RedHat-Spam-Score: -2.309 (BAYES_00, DCC_REPUT_00_12, RCVD_IN_DNSWL_NONE, URIBL_BLOCKED) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Scanned-By: MIMEDefang 2.68 on 10.5.110.19 X-MIME-Autoconverted: from quoted-printable to 8bit by lists01.pubmisc.prod.ext.phx2.redhat.com id r7KFG988016948 X-loop: dm-devel@redhat.com Cc: Christoph Hellwig , David Howells , Thomas Gleixner , Sebastian@redhat.com, Andrzej Siewior Subject: [dm-devel] [PATCH] bcache: Remove use of down/up_read_non_owner() X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The down/up_read_non_owner() is a nasty hack in the API of the rwsem operations. It was once removed, but then resurrected for use with bcache. Not only is the API an abomination to the rwsem API, it also prevents bcache from ever being compiled with PREEMPT_RT, as the RT kernel requires all rwsems to have an owner. Instead of keeping the down/up_read_non_owner() around, it is better to modify the one user of it and have it do something a bit differently. >From reading the bcache code, the reason for the non_owner usage is that a request is made, and writers must wait for that request to finish before they can continue. But because the request is completed by another task, the rwsem can not be held for read and then released on completion. Instead of abusing the rwsem code for this, I added a refcount "nr_requests" to the cached_dev structure as well as a "write_waiters" wait queue. When a request is to be made, the rwsem is still taken for read, but this time with an owner. The refcount is incremented and before exiting the function, the rwsem is released. The writer will then take the rwsem for write, check the refcount, if it is not zero, it will release the rwsem, add itself to a wait_event() waiting for refcount to become zero, and then try again. This should be a suitable replacement for the abuse of the rwsem non_owner API. Signed-off-by: Steven Rostedt --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel Index: linux-trace.git/drivers/md/bcache/bcache.h =================================================================== --- linux-trace.git.orig/drivers/md/bcache/bcache.h +++ linux-trace.git/drivers/md/bcache/bcache.h @@ -486,6 +486,15 @@ struct cached_dev { atomic_t running; /* + * Requests in progress. + */ + atomic_t nr_requests; + /* + * Writers waiting for requests to finish. + */ + wait_queue_head_t write_waiters; + + /* * Writes take a shared lock from start to finish; scanning for dirty * data to refill the rb tree requires an exclusive lock. */ Index: linux-trace.git/drivers/md/bcache/request.c =================================================================== --- linux-trace.git.orig/drivers/md/bcache/request.c +++ linux-trace.git/drivers/md/bcache/request.c @@ -998,7 +998,9 @@ static void cached_dev_write_complete(st struct search *s = container_of(cl, struct search, cl); struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); - up_read_non_owner(&dc->writeback_lock); + if (atomic_dec_return(&dc->nr_requests) == 0) + wake_up(&dc->write_waiters); + cached_dev_bio_complete(cl); } @@ -1024,7 +1026,8 @@ static void request_write(struct cached_ bch_keybuf_check_overlapping(&s->op.c->moving_gc_keys, &start, &end); check_should_skip(dc, s); - down_read_non_owner(&dc->writeback_lock); + down_read(&dc->writeback_lock); + atomic_inc(&dc->nr_requests); if (bch_keybuf_check_overlapping(&dc->writeback_keys, &start, &end)) { s->op.skip = false; @@ -1053,6 +1056,7 @@ static void request_write(struct cached_ } out: closure_call(&s->op.cl, bch_insert_data, NULL, cl); + up_read(&dc->writeback_lock); continue_at(cl, cached_dev_write_complete, NULL); skip: s->op.skip = true; Index: linux-trace.git/drivers/md/bcache/super.c =================================================================== --- linux-trace.git.orig/drivers/md/bcache/super.c +++ linux-trace.git/drivers/md/bcache/super.c @@ -1081,6 +1081,8 @@ static void register_bdev(struct cache_s dc->sb_bio.bi_io_vec[0].bv_page = sb_page; get_page(sb_page); + init_waitqueue_head(&dc->write_waiters); + if (cached_dev_init(dc, sb->block_size << 9)) goto err; Index: linux-trace.git/drivers/md/bcache/writeback.c =================================================================== --- linux-trace.git.orig/drivers/md/bcache/writeback.c +++ linux-trace.git/drivers/md/bcache/writeback.c @@ -121,6 +121,20 @@ static void dirty_init(struct keybuf_key bch_bio_map(bio, NULL); } +/* + * If requests are in transit, we must wait for them to finish. + */ +static void down_writeback_lock(struct cached_dev *dc) +{ +again: + down_write(&dc->writeback_lock); + if (atomic_read(&dc->nr_requests)) { + up_write(&dc->writeback_lock); + wait_event(dc->write_waiters, !atomic_read(&dc->nr_requests)); + goto again; + } +} + static void refill_dirty(struct closure *cl) { struct cached_dev *dc = container_of(cl, struct cached_dev, @@ -134,7 +148,7 @@ static void refill_dirty(struct closure !dc->writeback_running) closure_return(cl); - down_write(&dc->writeback_lock); + down_writeback_lock(dc); if (!atomic_read(&dc->has_dirty)) { SET_BDEV_STATE(&dc->sb, BDEV_STATE_CLEAN);