mbox series

[v5,0/6] mm/memcg: Address PREEMPT_RT problems instead of disabling it.

Message ID 20220226204144.1008339-1-bigeasy@linutronix.de (mailing list archive)
Headers show
Series mm/memcg: Address PREEMPT_RT problems instead of disabling it. | expand

Message

Sebastian Andrzej Siewior Feb. 26, 2022, 8:41 p.m. UTC
Hi,

this series aims to address the memcg related problem on PREEMPT_RT.

I tested them on CONFIG_PREEMPT and CONFIG_PREEMPT_RT with the
tools/testing/selftests/cgroup/* tests and I haven't observed any
regressions (other than the lockdep report that is already there).

Changes since v4:
- Added additional counter index to __mod_memcg_lruvec_state() which are
  updated with enabled interrupts but with disabled interrupts. Also
  disable these checks on PREEMPT_RT. Reported by Shakeel Butt.

- Add additional comment regarding `obj' in drain_obj_stock().

- Disable migration in drain_all_stock() and drain the local stock
  instead of scheduling a worker.

Changes since v3:
- Added __memcg_stats_lock() to __mod_memcg_lruvec_state(). This one
  does not check for disabled interrupts on !RT. The only user
  (__mod_memcg_lruvec_state()) checks if the context is task (neither
  soft nor hard irq) if the two idx are used which are used by rmap.c
  and otherwise it checks for disabled interrupts. Reported by Shakeel
  Butt.

- In drain_all_stock() migration is disabled and drain_local_stock() is
  invoked directly if the request CPU is the local CPU. 

v3: https://lore.kernel.org/all/20220217094802.3644569-1-bigeasy@linutronix.de/

Changes since v2:
- rebased on top of v5.17-rc4-mmots-2022-02-15-20-39.

- Added memcg_stats_lock() in 3/5 so it a little more obvious and
  hopefully easiert to maintain.

- Opencoded obj_cgroup_uncharge_pages() in drain_obj_stock(). The
  __locked suffix was confusing.

v2: https://lore.kernel.org/all/20220211223537.2175879-1-bigeasy@linutronix.de/

Changes since v1:
- Made a full patch from Michal Hocko's diff to disable the from-IRQ vs
  from-task optimisation

- Disabling threshold event handlers is using now IS_ENABLED(PREEMPT_RT)
  instead of #ifdef. The outcome is the same but there is no need to
  shuffle the code around.

v1: https://lore.kernel.org/all/20220125164337.2071854-1-bigeasy@linutronix.de/

Changes since the RFC:
- cgroup.event_control / memory.soft_limit_in_bytes is disabled on
  PREEMPT_RT. It is a deprecated v1 feature. Fixing the signal path is
  not worth it.

- The updates to per-CPU counters are usually synchronised by disabling
  interrupts. There are a few spots where assumption about disabled
  interrupts are not true on PREEMPT_RT and therefore preemption is
  disabled. This is okay since the counter are never written from
  in_irq() context.

RFC: https://lore.kernel.org/all/20211222114111.2206248-1-bigeasy@linutronix.de/

Sebastian