From patchwork Wed Mar 4 23:57:39 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 5941401 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 2B5A99F318 for ; Wed, 4 Mar 2015 23:59:00 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 39E902034C for ; Wed, 4 Mar 2015 23:58:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 32B7A20353 for ; Wed, 4 Mar 2015 23:58:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753608AbbCDX62 (ORCPT ); Wed, 4 Mar 2015 18:58:28 -0500 Received: from cantor2.suse.de ([195.135.220.15]:33343 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753606AbbCDX60 (ORCPT ); Wed, 4 Mar 2015 18:58:26 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 45521ABED; Wed, 4 Mar 2015 23:58:24 +0000 (UTC) From: NeilBrown To: Alexander Viro Date: Thu, 05 Mar 2015 10:57:39 +1100 Subject: [PATCH 1/2] block_dev/DIO: Optionally allocate single 'struct dio' per file. Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Message-ID: <20150304235739.17330.94189.stgit@notabene.brown> In-Reply-To: <20150304234911.17330.65139.stgit@notabene.brown> References: <20150304234911.17330.65139.stgit@notabene.brown> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP To be able to support RAID metadata operations in user-space, mdmon (part of mdadm) sometimes needs to update the metadata on an array before any future writes to the array are permitted. This is particularly needed for recording a device failure. If that array is being used for swap (and even to some extent when just used for a filesystem) then any memory allocation performed by mdmon can cause a deadlock if the allocation waits for data to be written out to the array. mdmon uses mlockall(MCL_FUTURE|MCL_CURRENT) and is careful not to allocate any memory at the wrong time. However the kernel sometimes allocates memory on its behalf and this can deadlock. Updating the metadata requires an O_DIRECT write to each of a number of files (which were previously opened). Each write requires allocating a 'struct dio'. To avoid this deadlock risk, this patch caches the 'struct dio' the first time it is allocated so that future writes on the file do not require the allocation. It is cached in '->private_data' for the struct file. Only a single struct is cached so only sequential accesses are allocation-free. The caching is only performed if mlockall(MCL_FUTURE) is in effect, thus limiting the change to only those cases where it will bring a benefit. Effectively, the memory allocated for O_DIRECT access is 'locked' in place for future use. Signed-off-by: NeilBrown --- fs/block_dev.c | 7 ++++++- fs/direct-io.c | 18 ++++++++++++++++-- include/linux/fs.h | 6 ++++++ 3 files changed, 28 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/block_dev.c b/fs/block_dev.c index 975266be67d3..ed55e5329563 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -155,7 +155,10 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter, return __blockdev_direct_IO(rw, iocb, inode, I_BDEV(inode), iter, offset, blkdev_get_block, - NULL, NULL, 0); + NULL, NULL, + current->mm && + (current->mm->def_flags & VM_LOCKED) + ? DIO_PERSISTENT_DIO : 0); } int __sync_blockdev(struct block_device *bdev, int wait) @@ -1567,6 +1570,8 @@ EXPORT_SYMBOL(blkdev_put); static int blkdev_close(struct inode * inode, struct file * filp) { struct block_device *bdev = I_BDEV(filp->f_mapping->host); + if (filp->private_data) + dio_free(filp->private_data); blkdev_put(bdev, filp->f_mode); return 0; } diff --git a/fs/direct-io.c b/fs/direct-io.c index e181b6b2e297..ece5e45933d2 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -143,6 +143,11 @@ struct dio { static struct kmem_cache *dio_cache __read_mostly; +void dio_free(struct dio *dio) +{ + kmem_cache_free(dio_cache, dio); +} + /* * How many pages are in the queue? */ @@ -268,7 +273,9 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, aio_complete(dio->iocb, ret, 0); } - kmem_cache_free(dio_cache, dio); + if (!(dio->flags & DIO_PERSISTENT_DIO) || + cmpxchg(&dio->iocb->ki_filp->private_data, NULL, dio) != NULL) + dio_free(dio); return ret; } @@ -1131,7 +1138,14 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, if (rw == READ && !iov_iter_count(iter)) return 0; - dio = kmem_cache_alloc(dio_cache, GFP_KERNEL); + dio = NULL; + if ((flags & DIO_PERSISTENT_DIO) && + (dio = iocb->ki_filp->private_data) != NULL) { + if (cmpxchg(&iocb->ki_filp->private_data, dio, NULL) != dio) + dio = NULL; + } + if (!dio) + dio = kmem_cache_alloc(dio_cache, GFP_KERNEL); retval = -ENOMEM; if (!dio) goto out; diff --git a/include/linux/fs.h b/include/linux/fs.h index b4d71b5e1ff2..b821fa32ba3f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -52,6 +52,7 @@ struct seq_file; struct workqueue_struct; struct iov_iter; struct vm_fault; +struct dio; extern void __init inode_init(void); extern void __init inode_init_early(void); @@ -2612,9 +2613,14 @@ enum { /* filesystem can handle aio writes beyond i_size */ DIO_ASYNC_EXTEND = 0x04, + + /* file->private_data is used to store a 'struct dio' + * between calls */ + DIO_PERSISTENT_DIO = 0x08, }; void dio_end_io(struct bio *bio, int error); +void dio_free(struct dio *dio); ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct iov_iter *iter, loff_t offset,