Message ID | 534FB63A.9090402@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote: > > Once JBD2_ABORT is set, ocfs2_commit_cache will fail in > ocfs2_commit_thread. Then it will get into a loop with mass logs. This > will meaninglessly consume a larger number of resource and may lead to > system hung at last. > So limit printk in this case. > > Signed-off-by: Joseph Qi <joseph.qi@huawei.com> > --- > fs/ocfs2/journal.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c > index 44fc3e5..cfefbd1 100644 > --- a/fs/ocfs2/journal.c > +++ b/fs/ocfs2/journal.c > @@ -30,6 +30,7 @@ > #include <linux/kthread.h> > #include <linux/time.h> > #include <linux/random.h> > +#include <linux/delay.h> > > #include <cluster/masklog.h> > > @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg) > || kthread_should_stop()); > > status = ocfs2_commit_cache(osb); > - if (status < 0) > - mlog_errno(status); > + if (status < 0) { > + static unsigned long abort_warn_time; > + > + /* Warn about this once per minute */ > + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > + mlog(ML_ERROR, "status = %d, journal is " > + "already aborted.\n", status); > + msleep_interruptible(1000); > + } Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue right after this anyway - is there a problem with it waiting on that? Generally I really don't like peppering msleep() into the code where we might need to sleep - there is often a more elegant solution available. Thanks, --Mark -- Mark Fasheh
On 2014/4/18 5:01, Mark Fasheh wrote: > On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote: >> >> Once JBD2_ABORT is set, ocfs2_commit_cache will fail in >> ocfs2_commit_thread. Then it will get into a loop with mass logs. This >> will meaninglessly consume a larger number of resource and may lead to >> system hung at last. >> So limit printk in this case. >> >> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> >> --- >> fs/ocfs2/journal.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c >> index 44fc3e5..cfefbd1 100644 >> --- a/fs/ocfs2/journal.c >> +++ b/fs/ocfs2/journal.c >> @@ -30,6 +30,7 @@ >> #include <linux/kthread.h> >> #include <linux/time.h> >> #include <linux/random.h> >> +#include <linux/delay.h> >> >> #include <cluster/masklog.h> >> >> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg) >> || kthread_should_stop()); >> >> status = ocfs2_commit_cache(osb); >> - if (status < 0) >> - mlog_errno(status); >> + if (status < 0) { >> + static unsigned long abort_warn_time; >> + >> + /* Warn about this once per minute */ >> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) >> + mlog(ML_ERROR, "status = %d, journal is " >> + "already aborted.\n", status); >> + msleep_interruptible(1000); >> + } > > Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue > right after this anyway - is there a problem with it waiting on that? > Since jbd2 is already aborted, commit cache is meaningless. > Generally I really don't like peppering msleep() into the code where we > might need to sleep - there is often a more elegant solution available. > > Thanks, > --Mark > > -- > Mark Fasheh > >
On Fri, Apr 18, 2014 at 09:02:33AM +0800, Joseph Qi wrote: > On 2014/4/18 5:01, Mark Fasheh wrote: > > On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote: > >> > >> Once JBD2_ABORT is set, ocfs2_commit_cache will fail in > >> ocfs2_commit_thread. Then it will get into a loop with mass logs. This > >> will meaninglessly consume a larger number of resource and may lead to > >> system hung at last. > >> So limit printk in this case. > >> > >> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> > >> --- > >> fs/ocfs2/journal.c | 12 ++++++++++-- > >> 1 file changed, 10 insertions(+), 2 deletions(-) > >> > >> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c > >> index 44fc3e5..cfefbd1 100644 > >> --- a/fs/ocfs2/journal.c > >> +++ b/fs/ocfs2/journal.c > >> @@ -30,6 +30,7 @@ > >> #include <linux/kthread.h> > >> #include <linux/time.h> > >> #include <linux/random.h> > >> +#include <linux/delay.h> > >> > >> #include <cluster/masklog.h> > >> > >> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg) > >> || kthread_should_stop()); > >> > >> status = ocfs2_commit_cache(osb); > >> - if (status < 0) > >> - mlog_errno(status); > >> + if (status < 0) { > >> + static unsigned long abort_warn_time; > >> + > >> + /* Warn about this once per minute */ > >> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > >> + mlog(ML_ERROR, "status = %d, journal is " > >> + "already aborted.\n", status); > >> + msleep_interruptible(1000); > >> + } > > > > Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue > > right after this anyway - is there a problem with it waiting on that? > > > Since jbd2 is already aborted, commit cache is meaningless. I understand that, but I'm asking why the msleep and whether we can avoid that. To go back to my question: "ocfs2_commit_thread will wait on the checkpoint_event queue right after this anyway - is there a problem with it waiting on that?" Thanks, --Mark -- Mark Fasheh
On 2014/4/18 10:45, Mark Fasheh wrote: > On Fri, Apr 18, 2014 at 09:02:33AM +0800, Joseph Qi wrote: >> On 2014/4/18 5:01, Mark Fasheh wrote: >>> On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote: >>>> >>>> Once JBD2_ABORT is set, ocfs2_commit_cache will fail in >>>> ocfs2_commit_thread. Then it will get into a loop with mass logs. This >>>> will meaninglessly consume a larger number of resource and may lead to >>>> system hung at last. >>>> So limit printk in this case. >>>> >>>> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> >>>> --- >>>> fs/ocfs2/journal.c | 12 ++++++++++-- >>>> 1 file changed, 10 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c >>>> index 44fc3e5..cfefbd1 100644 >>>> --- a/fs/ocfs2/journal.c >>>> +++ b/fs/ocfs2/journal.c >>>> @@ -30,6 +30,7 @@ >>>> #include <linux/kthread.h> >>>> #include <linux/time.h> >>>> #include <linux/random.h> >>>> +#include <linux/delay.h> >>>> >>>> #include <cluster/masklog.h> >>>> >>>> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg) >>>> || kthread_should_stop()); >>>> >>>> status = ocfs2_commit_cache(osb); >>>> - if (status < 0) >>>> - mlog_errno(status); >>>> + if (status < 0) { >>>> + static unsigned long abort_warn_time; >>>> + >>>> + /* Warn about this once per minute */ >>>> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) >>>> + mlog(ML_ERROR, "status = %d, journal is " >>>> + "already aborted.\n", status); >>>> + msleep_interruptible(1000); >>>> + } >>> >>> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue >>> right after this anyway - is there a problem with it waiting on that? >>> >> Since jbd2 is already aborted, commit cache is meaningless. > > I understand that, but I'm asking why the msleep and whether we can avoid > that. To go back to my question: > > "ocfs2_commit_thread will wait on the checkpoint_event queue right after > this anyway - is there a problem with it waiting on that?" > > Thanks, > --Mark Sorry for my obscure description. If ocfs2_commit_cache fails because of JBD2_ABORT, j_num_trans won't be cleared. Then the condition of checkpoint event still evaluates true, so it won't wait. > > -- > Mark Fasheh > > . >
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index 44fc3e5..cfefbd1 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -30,6 +30,7 @@ #include <linux/kthread.h> #include <linux/time.h> #include <linux/random.h> +#include <linux/delay.h> #include <cluster/masklog.h> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg) || kthread_should_stop()); status = ocfs2_commit_cache(osb); - if (status < 0) - mlog_errno(status); + if (status < 0) { + static unsigned long abort_warn_time; + + /* Warn about this once per minute */ + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) + mlog(ML_ERROR, "status = %d, journal is " + "already aborted.\n", status); + msleep_interruptible(1000); + } if (kthread_should_stop() && atomic_read(&journal->j_num_trans)){ mlog(ML_KTHREAD,
Once JBD2_ABORT is set, ocfs2_commit_cache will fail in ocfs2_commit_thread. Then it will get into a loop with mass logs. This will meaninglessly consume a larger number of resource and may lead to system hung at last. So limit printk in this case. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> --- fs/ocfs2/journal.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)