From patchwork Wed Mar 23 20:12:09 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 8653931 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 0E5D29F38C for ; Wed, 23 Mar 2016 20:12:31 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id F20F020121 for ; Wed, 23 Mar 2016 20:12:29 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9ACA6203C4 for ; Wed, 23 Mar 2016 20:12:28 +0000 (UTC) Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u2NKCF1E021045 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 23 Mar 2016 20:12:15 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u2NKCFfY024647 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 23 Mar 2016 20:12:15 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1aip8Z-0005Ww-1a; Wed, 23 Mar 2016 13:12:15 -0700 Received: from userv0022.oracle.com ([156.151.31.74]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1aip8W-0005Wa-QS for ocfs2-devel@oss.oracle.com; Wed, 23 Mar 2016 13:12:12 -0700 Received: from aserp1020.oracle.com (aserp1020.oracle.com [141.146.126.67]) by userv0022.oracle.com (8.14.4/8.13.8) with ESMTP id u2NKCCZj022560 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 23 Mar 2016 20:12:12 GMT Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by aserp1020.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u2NKCBdd026624 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 23 Mar 2016 20:12:11 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.15.0.59/8.15.0.59) with SMTP id u2NKCBUY023134 for ; Wed, 23 Mar 2016 20:12:11 GMT Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) by userp2040.oracle.com with ESMTP id 21u828gv8x-1 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 23 Mar 2016 20:12:11 +0000 Received: from akpm3.mtv.corp.google.com (unknown [104.132.1.65]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 8488A69; Wed, 23 Mar 2016 20:12:09 +0000 (UTC) Date: Wed, 23 Mar 2016 13:12:09 -0700 From: akpm@linux-foundation.org To: mfasheh@suse.de, jlbec@evilplan.org, junxiao.bi@oracle.com, joseph.qi@huawei.com, ocfs2-devel@oss.oracle.com, akpm@linux-foundation.org, ryan.ding@oracle.com Message-ID: <56f2f899.3BIexEaPTpR1F6ZV%akpm@linux-foundation.org> User-Agent: Heirloom mailx 12.5 6/20/10 MIME-Version: 1.0 X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:140.211.169.12/30 include:_spf.google.com ~all X-ServerName: mail.linuxfoundation.org X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8113 signatures=670704 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1601100000 definitions=main-1603230309 Subject: [Ocfs2-devel] [patch 09/25] ocfs2: fix ip_unaligned_aio deadlock with dio work queue X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Ryan Ding Subject: ocfs2: fix ip_unaligned_aio deadlock with dio work queue In the current implementation of unaligned aio+dio, lock order behave as follow: in user process context: -> call io_submit() -> get i_mutex <== window1 -> get ip_unaligned_aio -> submit direct io to block device -> release i_mutex -> io_submit() return in dio work queue context(the work queue is created in __blockdev_direct_IO): -> release ip_unaligned_aio <== window2 -> get i_mutex -> clear unwritten flag & change i_size -> release i_mutex There is a limitation to the thread number of dio work queue. 256 at default. If all 256 thread are in the above 'window2' stage, and there is a user process in the 'window1' stage, the system will became deadlock. Since the user process hold i_mutex to wait ip_unaligned_aio lock, while there is a direct bio hold ip_unaligned_aio mutex who is waiting for a dio work queue thread to be schedule. But all the dio work queue thread is waiting for i_mutex lock in 'window2'. This case only happened in a test which send a large number(more than 256) of aio at one io_submit() call. My design is to remove ip_unaligned_aio lock. Change it to a sync io instead. Just like ip_unaligned_aio lock, serialize the unaligned aio dio. [akpm@linux-foundation.org: remove OCFS2_IOCB_UNALIGNED_IO, per Junxiao Bi] Signed-off-by: Ryan Ding Reviewed-by: Junxiao Bi Cc: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton --- fs/ocfs2/aops.c | 6 ------ fs/ocfs2/aops.h | 8 -------- fs/ocfs2/file.c | 27 +++++++++------------------ fs/ocfs2/inode.h | 3 --- fs/ocfs2/super.c | 1 - 5 files changed, 9 insertions(+), 36 deletions(-) diff -puN fs/ocfs2/aops.c~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue fs/ocfs2/aops.c --- a/fs/ocfs2/aops.c~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue +++ a/fs/ocfs2/aops.c @@ -2391,12 +2391,6 @@ static int ocfs2_dio_end_io(struct kiocb /* this io's submitter should not have unlocked this before we could */ BUG_ON(!ocfs2_iocb_is_rw_locked(iocb)); - if (ocfs2_iocb_is_unaligned_aio(iocb)) { - ocfs2_iocb_clear_unaligned_aio(iocb); - - mutex_unlock(&OCFS2_I(inode)->ip_unaligned_aio); - } - if (private) ocfs2_dio_end_io_write(inode, private, offset, bytes); diff -puN fs/ocfs2/aops.h~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue fs/ocfs2/aops.h --- a/fs/ocfs2/aops.h~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue +++ a/fs/ocfs2/aops.h @@ -84,7 +84,6 @@ static inline void ocfs2_iocb_set_rw_loc enum ocfs2_iocb_lock_bits { OCFS2_IOCB_RW_LOCK = 0, OCFS2_IOCB_RW_LOCK_LEVEL, - OCFS2_IOCB_UNALIGNED_IO, OCFS2_IOCB_NUM_LOCKS }; @@ -93,11 +92,4 @@ enum ocfs2_iocb_lock_bits { #define ocfs2_iocb_rw_locked_level(iocb) \ test_bit(OCFS2_IOCB_RW_LOCK_LEVEL, (unsigned long *)&iocb->private) -#define ocfs2_iocb_set_unaligned_aio(iocb) \ - set_bit(OCFS2_IOCB_UNALIGNED_IO, (unsigned long *)&iocb->private) -#define ocfs2_iocb_clear_unaligned_aio(iocb) \ - clear_bit(OCFS2_IOCB_UNALIGNED_IO, (unsigned long *)&iocb->private) -#define ocfs2_iocb_is_unaligned_aio(iocb) \ - test_bit(OCFS2_IOCB_UNALIGNED_IO, (unsigned long *)&iocb->private) - #endif /* OCFS2_FILE_H */ diff -puN fs/ocfs2/file.c~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue fs/ocfs2/file.c --- a/fs/ocfs2/file.c~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue +++ a/fs/ocfs2/file.c @@ -2178,7 +2178,7 @@ static ssize_t ocfs2_file_write_iter(str struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); int full_coherency = !(osb->s_mount_opt & OCFS2_MOUNT_COHERENCY_BUFFERED); - int unaligned_dio = 0; + void *saved_ki_complete = NULL; int append_write = ((iocb->ki_pos + count) >= i_size_read(inode) ? 1 : 0); @@ -2241,17 +2241,12 @@ static ssize_t ocfs2_file_write_iter(str goto out; } - if (direct_io && !is_sync_kiocb(iocb)) - unaligned_dio = ocfs2_is_io_unaligned(inode, count, iocb->ki_pos); - - if (unaligned_dio) { + if (direct_io && !is_sync_kiocb(iocb) && + ocfs2_is_io_unaligned(inode, count, iocb->ki_pos)) { /* - * Wait on previous unaligned aio to complete before - * proceeding. + * Make it a sync io if it's an unaligned aio. */ - mutex_lock(&OCFS2_I(inode)->ip_unaligned_aio); - /* Mark the iocb as needing an unlock in ocfs2_dio_end_io */ - ocfs2_iocb_set_unaligned_aio(iocb); + saved_ki_complete = xchg(&iocb->ki_complete, NULL); } /* communicate with ocfs2_dio_end_io */ @@ -2272,11 +2267,10 @@ static ssize_t ocfs2_file_write_iter(str */ if ((written == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) { rw_level = -1; - unaligned_dio = 0; } if (unlikely(written <= 0)) - goto no_sync; + goto out; if (((file->f_flags & O_DSYNC) && !direct_io) || IS_SYNC(inode)) { @@ -2298,13 +2292,10 @@ static ssize_t ocfs2_file_write_iter(str iocb->ki_pos - 1); } -no_sync: - if (unaligned_dio && ocfs2_iocb_is_unaligned_aio(iocb)) { - ocfs2_iocb_clear_unaligned_aio(iocb); - mutex_unlock(&OCFS2_I(inode)->ip_unaligned_aio); - } - out: + if (saved_ki_complete) + xchg(&iocb->ki_complete, saved_ki_complete); + if (rw_level != -1) ocfs2_rw_unlock(inode, rw_level); diff -puN fs/ocfs2/inode.h~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue fs/ocfs2/inode.h --- a/fs/ocfs2/inode.h~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue +++ a/fs/ocfs2/inode.h @@ -43,9 +43,6 @@ struct ocfs2_inode_info /* protects extended attribute changes on this inode */ struct rw_semaphore ip_xattr_sem; - /* Number of outstanding AIO's which are not page aligned */ - struct mutex ip_unaligned_aio; - /* These fields are protected by ip_lock */ spinlock_t ip_lock; u32 ip_open_count; diff -puN fs/ocfs2/super.c~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue fs/ocfs2/super.c --- a/fs/ocfs2/super.c~ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue +++ a/fs/ocfs2/super.c @@ -1747,7 +1747,6 @@ static void ocfs2_inode_init_once(void * INIT_LIST_HEAD(&oi->ip_io_markers); INIT_LIST_HEAD(&oi->ip_unwritten_list); oi->ip_dir_start_lookup = 0; - mutex_init(&oi->ip_unaligned_aio); init_rwsem(&oi->ip_alloc_sem); init_rwsem(&oi->ip_xattr_sem); mutex_init(&oi->ip_io_mutex);