From patchwork Sun Feb 14 23:43:12 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Al Viro X-Patchwork-Id: 8306241 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 7A10D9F372 for ; Sun, 14 Feb 2016 23:43:23 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 8711B204CF for ; Sun, 14 Feb 2016 23:43:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 639402049D for ; Sun, 14 Feb 2016 23:43:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753978AbcBNXnR (ORCPT ); Sun, 14 Feb 2016 18:43:17 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:42194 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753029AbcBNXnQ (ORCPT ); Sun, 14 Feb 2016 18:43:16 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.76 #1 (Red Hat Linux)) id 1aV6Js-0007zU-Dq; Sun, 14 Feb 2016 23:43:12 +0000 Date: Sun, 14 Feb 2016 23:43:12 +0000 From: Al Viro To: Mike Marshall Cc: Martin Brandenburg , Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160214234312.GX17997@ZenIV.linux.org.uk> References: <20160209231328.GK17997@ZenIV.linux.org.uk> <20160211004432.GM17997@ZenIV.linux.org.uk> <20160212042757.GP17997@ZenIV.linux.org.uk> <20160213174738.GR17997@ZenIV.linux.org.uk> <20160214025615.GU17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sun, Feb 14, 2016 at 05:31:10PM -0500, Mike Marshall wrote: > I added the list_del... > > Everything is very resilient, I killed > the client-core over and over while dbench > was running at the same time as ls -R > was running, and the client-core always > restarted... until finally, it didn't. I guess > related to the state of just what was going on > at the time... Hit the WARN_ON in service_operation, > and then oopsed on the orangefs_bufmap_put > down at the end of wait_for_direct_io... Bloody hell... I think I see what's going on, and presumably the newer slot allocator would fix that. Look: closing control device (== daemon death) checks if we have a bufmap installed and drops a reference to it in that case. The reason why it's conditional is that we might have not gotten around to installing one (it's done via ioctl on control device). But ->release() does *NOT* wait for all references to go away! In other words, it's possible to restart the daemon while the old bufmap is still there. Then have it killed after it has opened control devices and before the old bufmap has run down. For ->release() it looks like we *have* gotten around to installing bufmap, and need the reference dropped. In reality, the reference acquired when we were installing that one has already been dropped, so we get double put. With expected results... If below ends up fixing the symptoms, analysis above has a good chance to be correct. This is no way to wait for rundown, of course - I'm not suggesting it as the solution, just as a way to narrow down what's going on. Incidentally, could you fold the list_del() part into offending commit (orangefs: delay freeing slot until cancel completes) and repush your for-next? --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c index 6a7df12..630246d 100644 --- a/fs/orangefs/devorangefs-req.c +++ b/fs/orangefs/devorangefs-req.c @@ -529,6 +529,9 @@ static int orangefs_devreq_release(struct inode *inode, struct file *file) purge_inprogress_ops(); gossip_debug(GOSSIP_DEV_DEBUG, "pvfs2-client-core: device close complete\n"); + /* VERY CRUDE, NOT FOR MERGE */ + while (orangefs_get_bufmap_init()) + schedule_timeout(HZ); open_access_count = 0; mutex_unlock(&devreq_mutex); return 0; diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h index 41f8bb1f..1e28555 100644 --- a/fs/orangefs/orangefs-kernel.h +++ b/fs/orangefs/orangefs-kernel.h @@ -261,6 +261,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op) { spin_lock(&op->lock); if (unlikely(op_is_cancel(op))) { + list_del(&op->list); spin_unlock(&op->lock); put_cancel(op); } else {