Message ID | 5187C19B.4040708@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Looks good to me. Acked-by: Sunil Mushran <sunil.mushran@gmail.com> On Mon, May 6, 2013 at 7:43 AM, Joseph Qi <joseph.qi@huawei.com> wrote: > In ocfs2_file_aio_write, it does ocfs2_rw_lock first and then > ocfs2_inode_lock. But if ocfs2_inode_lock failed, it goes to out_sems > without unlocking rw lock. This will cause a bug in ocfs2_lock_res_free > when testing res->l_ex_holders, which is increased in > __ocfs2_cluster_lock and decreased in __ocfs2_cluster_unlock. > > Signed-off-by: Joseph Qi <joseph.qi@huawei.com> > > --- > fs/ocfs2/file.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index 6474cb4..e2cd7a8 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -2290,7 +2290,7 @@ relock: > ret = ocfs2_inode_lock(inode, NULL, 1); > if (ret < 0) { > mlog_errno(ret); > - goto out_sems; > + goto out; > } > > ocfs2_inode_unlock(inode, 1); > -- > 1.7.9.7 > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
On Mon, 6 May 2013 22:43:39 +0800 Joseph Qi <joseph.qi@huawei.com> wrote: > In ocfs2_file_aio_write, it does ocfs2_rw_lock first and then > ocfs2_inode_lock. But if ocfs2_inode_lock failed, it goes to out_sems > without unlocking rw lock. This will cause a bug in ocfs2_lock_res_free > when testing res->l_ex_holders, which is increased in > __ocfs2_cluster_lock and decreased in __ocfs2_cluster_unlock. > > ... > > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -2290,7 +2290,7 @@ relock: > ret = ocfs2_inode_lock(inode, NULL, 1); > if (ret < 0) { > mlog_errno(ret); > - goto out_sems; > + goto out; > } > > ocfs2_inode_unlock(inode, 1); That seems like a fairly serious bug. How long has it been there and what userspace actions are required to trigger it? (I'm trying to work out which kernel versions we should merge the fix into, but the changelog didn't give me enough info to determine this)
It should almost never trigger. ocfs2_inode_lock() should always succeed and only return after it has gotten the required lock. On Wed, May 8, 2013 at 12:38 PM, Andrew Morton <akpm@linux-foundation.org>wrote: > On Mon, 6 May 2013 22:43:39 +0800 Joseph Qi <joseph.qi@huawei.com> wrote: > > > In ocfs2_file_aio_write, it does ocfs2_rw_lock first and then > > ocfs2_inode_lock. But if ocfs2_inode_lock failed, it goes to out_sems > > without unlocking rw lock. This will cause a bug in ocfs2_lock_res_free > > when testing res->l_ex_holders, which is increased in > > __ocfs2_cluster_lock and decreased in __ocfs2_cluster_unlock. > > > > ... > > > > --- a/fs/ocfs2/file.c > > +++ b/fs/ocfs2/file.c > > @@ -2290,7 +2290,7 @@ relock: > > ret = ocfs2_inode_lock(inode, NULL, 1); > > if (ret < 0) { > > mlog_errno(ret); > > - goto out_sems; > > + goto out; > > } > > > > ocfs2_inode_unlock(inode, 1); > > That seems like a fairly serious bug. How long has it been there and > what userspace actions are required to trigger it? > > (I'm trying to work out which kernel versions we should merge the > fix into, but the changelog didn't give me enough info to determine > this) > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
On 2013/5/9 3:38, Andrew Morton wrote: > On Mon, 6 May 2013 22:43:39 +0800 Joseph Qi <joseph.qi@huawei.com> wrote: > >> In ocfs2_file_aio_write, it does ocfs2_rw_lock first and then >> ocfs2_inode_lock. But if ocfs2_inode_lock failed, it goes to out_sems >> without unlocking rw lock. This will cause a bug in ocfs2_lock_res_free >> when testing res->l_ex_holders, which is increased in >> __ocfs2_cluster_lock and decreased in __ocfs2_cluster_unlock. >> >> ... >> >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -2290,7 +2290,7 @@ relock: >> ret = ocfs2_inode_lock(inode, NULL, 1); >> if (ret < 0) { >> mlog_errno(ret); >> - goto out_sems; >> + goto out; >> } >> >> ocfs2_inode_unlock(inode, 1); > > That seems like a fairly serious bug. How long has it been there and > what userspace actions are required to trigger it? > > (I'm trying to work out which kernel versions we should merge the > fix into, but the changelog didn't give me enough info to determine > this) > > . > Sorry for the delayed reply. The reproducible case is lots of write IOs plus storage link down and then restore. And my kernel is 3.0.13.
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 6474cb4..e2cd7a8 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2290,7 +2290,7 @@ relock: ret = ocfs2_inode_lock(inode, NULL, 1); if (ret < 0) { mlog_errno(ret); - goto out_sems; + goto out; } ocfs2_inode_unlock(inode, 1);
In ocfs2_file_aio_write, it does ocfs2_rw_lock first and then ocfs2_inode_lock. But if ocfs2_inode_lock failed, it goes to out_sems without unlocking rw lock. This will cause a bug in ocfs2_lock_res_free when testing res->l_ex_holders, which is increased in __ocfs2_cluster_lock and decreased in __ocfs2_cluster_unlock. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> --- fs/ocfs2/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)