From patchwork Wed Mar 4 23:57:39 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 5941391 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B273ABF440 for ; Wed, 4 Mar 2015 23:58:52 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D460E2034A for ; Wed, 4 Mar 2015 23:58:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6308D20320 for ; Wed, 4 Mar 2015 23:58:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753637AbbCDX6d (ORCPT ); Wed, 4 Mar 2015 18:58:33 -0500 Received: from cantor2.suse.de ([195.135.220.15]:33357 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753633AbbCDX6c (ORCPT ); Wed, 4 Mar 2015 18:58:32 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D8103AC70; Wed, 4 Mar 2015 23:58:30 +0000 (UTC) From: NeilBrown To: Alexander Viro Date: Thu, 05 Mar 2015 10:57:39 +1100 Subject: [PATCH 2/2] block_dev/DIO - cache one bio allocation when caching a DIO. Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Message-ID: <20150304235739.17330.85116.stgit@notabene.brown> In-Reply-To: <20150304234911.17330.65139.stgit@notabene.brown> References: <20150304234911.17330.65139.stgit@notabene.brown> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When performing an O_DIRECT write to a block device, a 'struct bio' is allocated from a mempool. There is only one mempool for all block devices so if a single block device blocked indefinitely, the mempool could in theory be exhausted and other block devices would be affected. When mdmon needs to update RAID metadata (see previous patch) it needs to perform an O_DIRECT write to some block devices while another block device (the array) is frozen. This could conceivably lead to a deadlock. Rather than allocate one mempool per block device (which would be an effective solution), this patch effects a single-bio pool for each 'struct dio' that is being used by an mlockall(MCL_FUTURE) process. 'cache_bio' is added to 'struct dio' and placed at the end so that it isn't zeroed out regularly. When an allocation is needed, the bio is used if it is present and large enough. When a bio if freed, it is placed here if appropriate. Naturally it is freed when the file is closed. All other allocations to serve O_DIRECT writes are further down the stack and use mempools that cannot be exhausted by a frozen md array. Signed-off-by: NeilBrown --- fs/direct-io.c | 45 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 39 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/direct-io.c b/fs/direct-io.c index ece5e45933d2..554913e9cc30 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -139,12 +139,17 @@ struct dio { struct page *pages[DIO_PAGES]; /* page buffer */ struct work_struct complete_work;/* deferred AIO completion */ }; + struct bio *cache_bio; } ____cacheline_aligned_in_smp; static struct kmem_cache *dio_cache __read_mostly; void dio_free(struct dio *dio) { + if (dio->cache_bio) { + bio_put(dio->cache_bio); + dio->cache_bio = NULL; + } kmem_cache_free(dio_cache, dio); } @@ -362,13 +367,24 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, struct block_device *bdev, sector_t first_sector, int nr_vecs) { - struct bio *bio; + struct bio *bio = NULL; + if ((dio->flags & DIO_PERSISTENT_DIO) && dio->cache_bio) { + spin_lock_irq(&dio->bio_lock); + if (dio->cache_bio && + dio->cache_bio->bi_max_vecs >= nr_vecs) { + bio = dio->cache_bio; + dio->cache_bio = NULL; + bio_reset(bio); + } + spin_unlock_irq(&dio->bio_lock); + } /* * bio_alloc() is guaranteed to return a bio when called with * __GFP_WAIT and we request a valid number of vectors. */ - bio = bio_alloc(GFP_KERNEL, nr_vecs); + if (!bio) + bio = bio_alloc(GFP_KERNEL, nr_vecs); bio->bi_bdev = bdev; bio->bi_iter.bi_sector = first_sector; @@ -480,7 +496,21 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio) set_page_dirty_lock(page); page_cache_release(page); } - bio_put(bio); + if (dio->flags & DIO_PERSISTENT_DIO) { + spin_lock_irq(&dio->bio_lock); + if (dio->cache_bio && + dio->cache_bio->bi_max_vecs < bio->bi_max_vecs) { + bio_put(dio->cache_bio); + dio->cache_bio = NULL; + } + if (dio->cache_bio == NULL) { + dio->cache_bio = bio; + bio = NULL; + } + spin_unlock_irq(&dio->bio_lock); + } + if (bio) + bio_put(bio); } return uptodate ? 0 : -EIO; } @@ -1144,8 +1174,11 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, if (cmpxchg(&iocb->ki_filp->private_data, dio, NULL) != dio) dio = NULL; } - if (!dio) + if (!dio) { dio = kmem_cache_alloc(dio_cache, GFP_KERNEL); + if (dio) + dio->cache_bio = NULL; + } retval = -ENOMEM; if (!dio) goto out; @@ -1169,7 +1202,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, end - 1); if (retval) { mutex_unlock(&inode->i_mutex); - kmem_cache_free(dio_cache, dio); + dio_free(dio); goto out; } } @@ -1205,7 +1238,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, * We grab i_mutex only for reads so we don't have * to release it here */ - kmem_cache_free(dio_cache, dio); + dio_free(dio); goto out; } }