Message ID | 1454336914.133285.297.camel@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello David: > > Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in > > jffs2_write_begin" > > http://article.gmane.org/gmane.linux.drivers.mtd/62951 > > > > This is a patch revising my original patch, which I sent to linux-mtd on > > 10-Nov-2015. I didn't see a response yet, but it's one of the outstanding > > patches above. > > That looks necessary but not sufficient. I think we need this > (untested) patch on top of it, to ensure that we *always* take the page > lock before f->sem? > > Please could you try what's in the tree at > http://git.infradead.org/users/dwmw2/jffs2-fixes.git I have been using a variant of Deng Chao's patch here for a long time, so that one has been tested quite a bit: http://lists.infradead.org/pipermail/linux-mtd/2013-August/048352.html. The problem with that patch was that it modified mm/filemap.c and include/linux/pagemap.h, which we were not too happy about. Your patch looks much simpler, and I will definitely test it. It may take a few days, though, as I have to unearth the test scripts, and find a time slot for testing. Best regards, Thomas Betker -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello David: > > Please could you try what's in the tree at > > http://git.infradead.org/users/dwmw2/jffs2-fixes.git > Your patch looks much simpler, and I will definitely test it. It may > take a few days, though, as I have to unearth the test scripts, and > find a time slot for testing. Here is what I did (sorry for the wait, things were piling up): 1) Removed Deng Chao's patch from my kernel, added your patch "jffs2: Fix page lock / f->sem deadlock". I am still on linux-3.14, but jffs2 hasn't changed much since then, so this shouldn't make a difference. Added a printk() before mutex_unlock(&f->sem) to check if the prospective page was locked, i.e. if the deadlock situation actually occurs. 2) On my target system, started wangzaiwei's test (with some fixes), plus a loop copying a large file over and over (to get GC rolling, which increases the chance of a deadlock). 3) After 24 hours, the system was still alive, and the printk() had been hit 32 times. So yes, I am confident that your patch avoids the deadlock, and if that's good enough for you, please add my Tested-by:. However, I am going to run some more stress tests here just to check that there weren't any unexpected side effects. (Don't get me wrong -- I am sure the patch is fine, but for me it's a case of "once bitten, twice shy" ...) Best regards, Thomas Betker -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-02-18 at 10:57 +0100, Thomas.Betker@rohde-schwarz.com wrote: > Hello David: > > > > > > > > > Please could you try what's in the tree at > > > http://git.infradead.org/users/dwmw2/jffs2-fixes.git > > > > Your patch looks much simpler, and I will definitely test it. It may > > take a few days, though, as I have to unearth the test scripts, and > > find a time slot for testing. > Here is what I did (sorry for the wait, things were piling up): > > 1) Removed Deng Chao's patch from my kernel, added your patch "jffs2: Fix > page lock / f->sem deadlock". I am still on linux-3.14, but jffs2 hasn't > changed much since then, so this shouldn't make a difference. Added a > printk() before mutex_unlock(&f->sem) to check if the prospective page was > locked, i.e. if the deadlock situation actually occurs. > > 2) On my target system, started wangzaiwei's test (with some fixes), plus > a loop copying a large file over and over (to get GC rolling, which > increases the chance of a deadlock). > > 3) After 24 hours, the system was still alive, and the printk() had been > hit 32 times. > > So yes, I am confident that your patch avoids the deadlock, and if that's > good enough for you, please add my Tested-by:. However, I am going to run > some more stress tests here just to check that there weren't any > unexpected side effects. (Don't get me wrong -- I am sure the patch is > fine, but for me it's a case of "once bitten, twice shy" ...) Can we get this upstream before next release? I don't think there will much more testing at this point. Jocke-- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Joakim: > Can we get this upstream before next release? I don't think there > will much more > testing at this point. I would second this. Actually, I did some additional stress testing, but didn't see any problems. Best regards, Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-02-25 at 10:57 +0100, Thomas.Betker@rohde-schwarz.com wrote: > Hello Joakim: > > > Can we get this upstream before next release? I don't think there > > will much more > > testing at this point. > > I would second this. Actually, I did some additional stress testing, but > didn't see any problems. OK, of the four fixes I had in my branch, three are Cc:stable and I have pushed them to linux-mtd.git. I'll give them a day or two in linux-next and then send a pull request. The fourth, which is just a mount speedup, I've put in l2-mtd.git. Brian, was that the right way to do it? I feel like an impostor pushing to my own repositories now :)
On Thu, Feb 25, 2016 at 11:22:12AM +0000, David Woodhouse wrote: > OK, of the four fixes I had in my branch, three are Cc:stable and I > have pushed them to linux-mtd.git. I'll give them a day or two in > linux-next and then send a pull request. > > The fourth, which is just a mount speedup, I've put in l2-mtd.git. > > Brian, was that the right way to do it? I feel like an impostor pushing > to my own repositories now :) Nope, that's perfect. Happy to have a JFFS2 maintainer again :) (Feel free to send the MAINTAINERS update that's also in linux-mtd.git/master. It was meant to ride along with whatever else needed pushed to Linus.) Brian -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/jffs2/README.Locking b/fs/jffs2/README.Locking index 3ea3655..8918ac9 100644 --- a/fs/jffs2/README.Locking +++ b/fs/jffs2/README.Locking @@ -2,10 +2,6 @@ JFFS2 LOCKING DOCUMENTATION --------------------------- -At least theoretically, JFFS2 does not require the Big Kernel Lock -(BKL), which was always helpfully obtained for it by Linux 2.4 VFS -code. It has its own locking, as described below. - This document attempts to describe the existing locking rules for JFFS2. It is not expected to remain perfectly up to date, but ought to be fairly close. @@ -69,6 +65,7 @@ Ordering constraints: any f->sem held. 2. Never attempt to lock two file mutexes in one thread. No ordering rules have been made for doing so. + 3. Never lock a page cache page with f->sem held. erase_completion_lock spinlock diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c index 6fb0802..5919fef 100644 --- a/fs/jffs2/gc.c +++ b/fs/jffs2/gc.c @@ -1316,14 +1316,17 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era BUG_ON(start > orig_start); } - /* First, use readpage() to read the appropriate page into the page cache */ - /* Q: What happens if we actually try to GC the _same_ page for which commit_write() - * triggered garbage collection in the first place? - * A: I _think_ it's OK. read_cache_page shouldn't deadlock, we'll write out the - * page OK. We'll actually write it out again in commit_write, which is a little - * suboptimal, but at least we're correct. - */ + /* The rules state that we must obtain the page lock *before* f->sem, so + * drop f->sem temporarily. Since we also hold c->alloc_sem, nothing's + * actually going to *change* so we're safe; we only allow reading. + * + * It is important to note that jffs2_write_begin() will ensure that its + * page is marked Uptodate before allocating space. That means that if we + * end up here trying to GC the *same* page that jffs2_write_begin() is + * trying to write out, read_cache_page() will not deadlock. */ + mutex_unlock(&f->sem); pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg); + mutex_lock(&f->sem); if (IS_ERR(pg_ptr)) { pr_warn("read_cache_page() returned error: %ld\n",