Message ID | 20150923210729.GA23180@mtj.duckdns.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 2015-09-23 at 17:07 -0400, Tejun Heo wrote: > Hello, > > So, this should make the regression go away. It doesn't fix the > underlying bugs but they shouldn't get triggered by people not > experimenting with cgroup. Tejun, this hits the nail on the head and makes the problem go away. I've tested the tip of Linuses tree (v4.3-rc2+) plus this patch - no data corruption after reboots. I've tested just the tip of Linuses tree (v4.3-rc2+) without this patch, and I do see the data corruption after reboots. Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Artem. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Wed, 2015-09-23 at 17:07 -0400, Tejun Heo wrote: > > Hello, > > > > So, this should make the regression go away. It doesn't fix the > > underlying bugs but they shouldn't get triggered by people not > > experimenting with cgroup. > > Tejun, > > this hits the nail on the head and makes the problem go away. > > I've tested the tip of Linuses tree (v4.3-rc2+) plus this patch - no > data corruption after reboots. > > I've tested just the tip of Linuses tree (v4.3-rc2+) without this > patch, and I do see the data corruption after reboots. > > Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > > Artem. I can confirm the patch fixes my "slow write" issue too. Tested-by: Dexuan Cui <decui@microsoft.com> -- Dexuan
On 09/23/2015 03:07 PM, Tejun Heo wrote: > inode_cgwb_enabled() gates cgroup writeback support. If it returns > true, each inode is attached to the corresponding memory domain which > gets mapped to io domain. It currently only tests whether the > filesystem and bdi support cgroup writeback; however, cgroup writeback > support doesn't work on traditional hierarchies and thus it should > also test whether memcg and iocg are on the default hierarchy. > > This caused traditional hierarchy setups to hit the cgroup writeback > path inadvertently and ended up creating separate writeback domains > for each memcg and mapping them all to the root iocg uncovering a > couple issues in the cgroup writeback path. > > cgroup writeback was never meant to be enabled on traditional > hierarchies. Make inode_cgwb_enabled() test whether both memcg and > iocg are on the default hierarchy. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Artem Bityutskiy <dedekind1@gmail.com> > Reported-by: Dexuan Cui <decui@microsoft.com> > Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com > Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net > --- > Hello, > > So, this should make the regression go away. It doesn't fix the > underlying bugs but they shouldn't get triggered by people not > experimenting with cgroup. > > I'm gonna keep digging the underlying issues but this should make the > regressions go away. If it's okay, I think it'd be better to route > this through cgroup/for-4.3-fixes as it's gonna cause a conflict with > for-4.4 branch and handling the merge there is easier. > > Thanks. > > include/linux/backing-dev.h | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) I'll ack this since it works around both the corruption issue and the performance regression, so we can avoid having to revert parts of this. And I know you'll keep hunting and get the real issue fixed in the mean time. Acked-by: Jens Axboe <axboe@fb.com>
Hello, On Thu, Sep 24, 2015 at 08:40:18AM +0000, Dexuan Cui wrote: > I can confirm the patch fixes my "slow write" issue too. > > Tested-by: Dexuan Cui <decui@microsoft.com> Yeah, this should make it go away w/o using cgroup writeback explicitly; however, I think the proper solution for cgroup writeback is moving bandwidth estimation from memory domain to io domain so that two separate bw estimations wouldn't interfere with each other leading to unexpected outcomes. I'll work on the changes. Thanks.
On Wed, Sep 23, 2015 at 05:07:29PM -0400, Tejun Heo wrote: > inode_cgwb_enabled() gates cgroup writeback support. If it returns > true, each inode is attached to the corresponding memory domain which > gets mapped to io domain. It currently only tests whether the > filesystem and bdi support cgroup writeback; however, cgroup writeback > support doesn't work on traditional hierarchies and thus it should > also test whether memcg and iocg are on the default hierarchy. > > This caused traditional hierarchy setups to hit the cgroup writeback > path inadvertently and ended up creating separate writeback domains > for each memcg and mapping them all to the root iocg uncovering a > couple issues in the cgroup writeback path. > > cgroup writeback was never meant to be enabled on traditional > hierarchies. Make inode_cgwb_enabled() test whether both memcg and > iocg are on the default hierarchy. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Artem Bityutskiy <dedekind1@gmail.com> > Reported-by: Dexuan Cui <decui@microsoft.com> > Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com > Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net Applying to cgroup/for-4.3-fixes. Thanks.
Hello, On Thu, Sep 24, 2015 at 04:47:36PM -0400, Tejun Heo wrote: > On Thu, Sep 24, 2015 at 08:40:18AM +0000, Dexuan Cui wrote: > > I can confirm the patch fixes my "slow write" issue too. > > > > Tested-by: Dexuan Cui <decui@microsoft.com> > > Yeah, this should make it go away w/o using cgroup writeback > explicitly; however, I think the proper solution for cgroup writeback > is moving bandwidth estimation from memory domain to io domain so that > two separate bw estimations wouldn't interfere with each other leading > to unexpected outcomes. I'll work on the changes. So, this one actually turns out to be mostly caused by enabling cgroup writeback when it shouldn't be. balance_dirty_pages() ended up looking at a different bdi_writeback from the actual writeback path so the throttling was completley off, so making sure that cgroup writeback doesn't get turned on traditional hierarchies is the right solution here. While auditing the behavior, I noticed a couple non-critical issues. Will post patches to fix them soon. Thanks.
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 5a5d79e..d5eb4ad1 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -13,6 +13,7 @@ #include <linux/sched.h> #include <linux/blkdev.h> #include <linux/writeback.h> +#include <linux/memcontrol.h> #include <linux/blk-cgroup.h> #include <linux/backing-dev-defs.h> #include <linux/slab.h> @@ -252,13 +253,19 @@ int inode_congested(struct inode *inode, int cong_bits); * @inode: inode of interest * * cgroup writeback requires support from both the bdi and filesystem. - * Test whether @inode has both. + * Also, both memcg and iocg have to be on the default hierarchy. Test + * whether all conditions are met. + * + * Note that the test result may change dynamically on the same inode + * depending on how memcg and iocg are configured. */ static inline bool inode_cgwb_enabled(struct inode *inode) { struct backing_dev_info *bdi = inode_to_bdi(inode); - return bdi_cap_account_dirty(bdi) && + return cgroup_on_dfl(mem_cgroup_root_css->cgroup) && + cgroup_on_dfl(blkcg_root_css->cgroup) && + bdi_cap_account_dirty(bdi) && (bdi->capabilities & BDI_CAP_CGROUP_WRITEBACK) && (inode->i_sb->s_iflags & SB_I_CGROUPWB); }
inode_cgwb_enabled() gates cgroup writeback support. If it returns true, each inode is attached to the corresponding memory domain which gets mapped to io domain. It currently only tests whether the filesystem and bdi support cgroup writeback; however, cgroup writeback support doesn't work on traditional hierarchies and thus it should also test whether memcg and iocg are on the default hierarchy. This caused traditional hierarchy setups to hit the cgroup writeback path inadvertently and ended up creating separate writeback domains for each memcg and mapping them all to the root iocg uncovering a couple issues in the cgroup writeback path. cgroup writeback was never meant to be enabled on traditional hierarchies. Make inode_cgwb_enabled() test whether both memcg and iocg are on the default hierarchy. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Artem Bityutskiy <dedekind1@gmail.com> Reported-by: Dexuan Cui <decui@microsoft.com> Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net --- Hello, So, this should make the regression go away. It doesn't fix the underlying bugs but they shouldn't get triggered by people not experimenting with cgroup. I'm gonna keep digging the underlying issues but this should make the regressions go away. If it's okay, I think it'd be better to route this through cgroup/for-4.3-fixes as it's gonna cause a conflict with for-4.4 branch and handling the merge there is easier. Thanks. include/linux/backing-dev.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html