Message ID | 20220125164337.2071854-3-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/memcg: Address PREEMPT_RT problems instead of disabling it. | expand |
On 1/25/22 17:43, Sebastian Andrzej Siewior wrote: > The per-CPU counter are modified with the non-atomic modifier. The > consistency is ensured by disabling interrupts for the update. > On non PREEMPT_RT configuration this works because acquiring a > spinlock_t typed lock with the _irq() suffix disables interrupts. On > PREEMPT_RT configurations the RMW operation can be interrupted. > > Another problem is that mem_cgroup_swapout() expects to be invoked with > disabled interrupts because the caller has to acquire a spinlock_t which > is acquired with disabled interrupts. Since spinlock_t never disables > interrupts on PREEMPT_RT the interrupts are never disabled at this > point. > > The code is never called from in_irq() context on PREEMPT_RT therefore > disabling preemption during the update is sufficient on PREEMPT_RT. > The sections which explicitly disable interrupts can remain on > PREEMPT_RT because the sections remain short and they don't involve > sleeping locks (memcg_check_events() is doing nothing on PREEMPT_RT). > > Disable preemption during update of the per-CPU variables which do not > explicitly disable interrupts. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> So it's like c68ed7945701 ("mm/vmstat: protect per cpu variables with preempt disable on RT") but we still don't want a wrapper of those constructs so they don't spread further, right :) Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > mm/memcontrol.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 36d27db673ca9..3d1b7cdd83db0 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -667,6 +667,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); > memcg = pn->memcg; > > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); > /* Update memcg */ > __this_cpu_add(memcg->vmstats_percpu->state[idx], val); > > @@ -674,6 +676,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); > > memcg_rstat_updated(memcg, val); > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_enable(); > } > > /** > @@ -756,8 +760,12 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, > if (mem_cgroup_disabled()) > return; > > + if (IS_ENABLED(PREEMPT_RT)) > + preempt_disable(); > __this_cpu_add(memcg->vmstats_percpu->events[idx], count); > memcg_rstat_updated(memcg, count); > + if (IS_ENABLED(PREEMPT_RT)) > + preempt_enable(); > } > > static unsigned long memcg_events(struct mem_cgroup *memcg, int event) > @@ -7194,9 +7202,18 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) > * i_pages lock which is taken with interrupts-off. It is > * important here to have the interrupts disabled because it is the > * only synchronisation we have for updating the per-CPU variables. > + * On PREEMPT_RT interrupts are never disabled and the updates to per-CPU > + * variables are synchronised by keeping preemption disabled. > */ > - VM_BUG_ON(!irqs_disabled()); > - mem_cgroup_charge_statistics(memcg, -nr_entries); > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { > + VM_BUG_ON(!irqs_disabled()); > + mem_cgroup_charge_statistics(memcg, -nr_entries); > + } else { > + preempt_disable(); > + mem_cgroup_charge_statistics(memcg, -nr_entries); > + preempt_enable(); > + } > + > memcg_check_events(memcg, page_to_nid(page)); > > css_put(&memcg->css);
On 2022-01-26 11:06:40 [+0100], Vlastimil Babka wrote: > So it's like c68ed7945701 ("mm/vmstat: protect per cpu variables with > preempt disable on RT") but we still don't want a wrapper of those > constructs so they don't spread further, right :) Right. We only have them because of assumption based on spin_lock_irq() which are not true on PREEMPT_RT. Having a generic one might let people to use them for the wrong reasons. The commit you mentioned changed all users while here I only changed those which missed it. Also I wasn't sure if preemption should be disabled or interrupts (to align with the other local_irq_save()). > Acked-by: Vlastimil Babka <vbabka@suse.cz> Thank you. Sebastian
On Tue 25-01-22 17:43:35, Sebastian Andrzej Siewior wrote: > The per-CPU counter are modified with the non-atomic modifier. The > consistency is ensured by disabling interrupts for the update. > On non PREEMPT_RT configuration this works because acquiring a > spinlock_t typed lock with the _irq() suffix disables interrupts. On > PREEMPT_RT configurations the RMW operation can be interrupted. > > Another problem is that mem_cgroup_swapout() expects to be invoked with > disabled interrupts because the caller has to acquire a spinlock_t which > is acquired with disabled interrupts. Since spinlock_t never disables > interrupts on PREEMPT_RT the interrupts are never disabled at this > point. > > The code is never called from in_irq() context on PREEMPT_RT therefore > disabling preemption during the update is sufficient on PREEMPT_RT. > The sections which explicitly disable interrupts can remain on > PREEMPT_RT because the sections remain short and they don't involve > sleeping locks (memcg_check_events() is doing nothing on PREEMPT_RT). > > Disable preemption during update of the per-CPU variables which do not > explicitly disable interrupts. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> I can see that you have already discussed the choice for open coded version with Vlastimil. I do not have a strong opinion but I have to say I dislike the construct because it is not really clear just from reading the code. A wrapper could go and explain the underlying problem - including potential asserts for !PREEMPT_RT. One way or the other the change looks correct AFAICS. Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/memcontrol.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 36d27db673ca9..3d1b7cdd83db0 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -667,6 +667,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); > memcg = pn->memcg; > > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); > /* Update memcg */ > __this_cpu_add(memcg->vmstats_percpu->state[idx], val); > > @@ -674,6 +676,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); > > memcg_rstat_updated(memcg, val); > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_enable(); > } > > /** > @@ -756,8 +760,12 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, > if (mem_cgroup_disabled()) > return; > > + if (IS_ENABLED(PREEMPT_RT)) > + preempt_disable(); > __this_cpu_add(memcg->vmstats_percpu->events[idx], count); > memcg_rstat_updated(memcg, count); > + if (IS_ENABLED(PREEMPT_RT)) > + preempt_enable(); > } > > static unsigned long memcg_events(struct mem_cgroup *memcg, int event) > @@ -7194,9 +7202,18 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) > * i_pages lock which is taken with interrupts-off. It is > * important here to have the interrupts disabled because it is the > * only synchronisation we have for updating the per-CPU variables. > + * On PREEMPT_RT interrupts are never disabled and the updates to per-CPU > + * variables are synchronised by keeping preemption disabled. > */ > - VM_BUG_ON(!irqs_disabled()); > - mem_cgroup_charge_statistics(memcg, -nr_entries); > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { > + VM_BUG_ON(!irqs_disabled()); > + mem_cgroup_charge_statistics(memcg, -nr_entries); > + } else { > + preempt_disable(); > + mem_cgroup_charge_statistics(memcg, -nr_entries); > + preempt_enable(); > + } > + > memcg_check_events(memcg, page_to_nid(page)); > > css_put(&memcg->css); > -- > 2.34.1
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 36d27db673ca9..3d1b7cdd83db0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -667,6 +667,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); /* Update memcg */ __this_cpu_add(memcg->vmstats_percpu->state[idx], val); @@ -674,6 +676,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); memcg_rstat_updated(memcg, val); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); } /** @@ -756,8 +760,12 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, if (mem_cgroup_disabled()) return; + if (IS_ENABLED(PREEMPT_RT)) + preempt_disable(); __this_cpu_add(memcg->vmstats_percpu->events[idx], count); memcg_rstat_updated(memcg, count); + if (IS_ENABLED(PREEMPT_RT)) + preempt_enable(); } static unsigned long memcg_events(struct mem_cgroup *memcg, int event) @@ -7194,9 +7202,18 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) * i_pages lock which is taken with interrupts-off. It is * important here to have the interrupts disabled because it is the * only synchronisation we have for updating the per-CPU variables. + * On PREEMPT_RT interrupts are never disabled and the updates to per-CPU + * variables are synchronised by keeping preemption disabled. */ - VM_BUG_ON(!irqs_disabled()); - mem_cgroup_charge_statistics(memcg, -nr_entries); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { + VM_BUG_ON(!irqs_disabled()); + mem_cgroup_charge_statistics(memcg, -nr_entries); + } else { + preempt_disable(); + mem_cgroup_charge_statistics(memcg, -nr_entries); + preempt_enable(); + } + memcg_check_events(memcg, page_to_nid(page)); css_put(&memcg->css);
The per-CPU counter are modified with the non-atomic modifier. The consistency is ensured by disabling interrupts for the update. On non PREEMPT_RT configuration this works because acquiring a spinlock_t typed lock with the _irq() suffix disables interrupts. On PREEMPT_RT configurations the RMW operation can be interrupted. Another problem is that mem_cgroup_swapout() expects to be invoked with disabled interrupts because the caller has to acquire a spinlock_t which is acquired with disabled interrupts. Since spinlock_t never disables interrupts on PREEMPT_RT the interrupts are never disabled at this point. The code is never called from in_irq() context on PREEMPT_RT therefore disabling preemption during the update is sufficient on PREEMPT_RT. The sections which explicitly disable interrupts can remain on PREEMPT_RT because the sections remain short and they don't involve sleeping locks (memcg_check_events() is doing nothing on PREEMPT_RT). Disable preemption during update of the per-CPU variables which do not explicitly disable interrupts. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- mm/memcontrol.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)