From patchwork Tue Nov 20 13:34:13 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Schmidt X-Patchwork-Id: 1773611 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 959433FD1A for ; Tue, 20 Nov 2012 13:34:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753454Ab2KTNeU (ORCPT ); Tue, 20 Nov 2012 08:34:20 -0500 Received: from mo-p05-ob.rzone.de ([81.169.146.180]:8136 "EHLO mo-p05-ob.rzone.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752421Ab2KTNeS (ORCPT ); Tue, 20 Nov 2012 08:34:18 -0500 X-RZG-AUTH: :O2kGeEG7b/pS1FyqBm+hjA1c6aeyuWq+PytuIqkA3nqK8G++lLpuGSNVtTT+p7ooGfF+ceXvod20b52aCcSaG3pdSQ1BlsF87dlXTeeFZfhM6WHVqQ== X-RZG-CLASS-ID: mo05 Received: from [IPv6:2a01:238:e100:320:7271:bcff:fe43:2d12] ([2a01:238:e100:320:7271:bcff:fe43:2d12]) by smtp.strato.de (josoe mo9) (RZmta 31.4 AUTH) with ESMTPA id m01a23oAKD64x6 ; Tue, 20 Nov 2012 14:34:13 +0100 (CET) Message-ID: <50AB86D5.6040501@jan-o-sch.net> Date: Tue, 20 Nov 2012 14:34:13 +0100 From: Jan Schmidt User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.28) Gecko/20120313 Lightning/1.0b2 Thunderbird/3.1.20 MIME-Version: 1.0 To: linux-btrfs CC: Arne Jansen , Chris Mason Subject: Scrub racing with btrfs workers on umount Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org I found a race condition in the way scrub interacts with btrfs workers: Please carefully distinguish "workers" as struct btrfs_workers from "worker" as struct btrfs_worker_thread. Process A: umount Process B: scrub-workers Process C: genwork-worker A: close_ctree() A: scrub_cancel() B: scrub_workers_put() B: btrfs_stop_workers() B: -> lock(workers->lock) B: -> splice idle worker list into worker list B: -> while (worker list) B: -> worker = delete from worker list B: -> unlock(workers->lock) B: -> kthread_stop(worker) C: __btrfs_start_workers() C: -> kthread_run starts new scrub-worker C: -> lock(workers->lock) /* from scrub-workers */ C: -> add to workers idle list C: -> unlock(workers->lock) B: -> lock(workers->lock) A: stop genwork-worker In that situation, the scrub-workers have one idle worker but return from btrfs_stop_workers(). There are several strategies to fix this: 1. Let close_ctree call scrub_cancel() after stopping the generic worker. 2. Let btrfs_stop_workers be more careful on insertions to the idle list, like: 3. Find out why __btrfs_start_workers() gets called in the above scenario at all. Any thoughts? Otherwise, I'm going for option 3 and will call back. -Jan --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 58b7d14..2523781 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -431,6 +431,8 @@ void btrfs_stop_workers(struct btrfs_workers *workers) kthread_stop(worker->task); spin_lock_irq(&workers->lock); put_worker(worker); + /* we dropped the lock above, check for new idle workers */ + list_splice_init(&workers->idle_list, &workers->worker_list); } spin_unlock_irq(&workers->lock); }