mbox series

[v3,0/9] cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus

Message ID 20210720141834.10624-1-longman@redhat.com (mailing list archive)
Headers show
Series cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus | expand

Message

Waiman Long July 20, 2021, 2:18 p.m. UTC
v3:
 - Add two new patches (patches 2 & 3) to fix bugs found during the
   testing process.
 - Add a new patch to enable inotify event notification when partition
   become invalid.
 - Add a test to test event notification when partition become invalid.

v2:
 - Drop v1 patch 1.
 - Break out some cosmetic changes into a separate patch (patch #1).
 - Add a new patch to clarify the transition to invalid partition root
   is mainly caused by hotplug events.
 - Enhance the partition root state test including CPU online/offline
   behavior and fix issues found by the test.

This patchset fixes two bugs and makes four enhancements to the cpuset
v2 code.

Bug fixes:

 Patch 2: Fix a hotplug handling bug when just all cpus in subparts_cpus
 are offlined.

 Patch 3: Fix violation of cpuset locking rule.

Enhancements: 

 Patch 4: Enable event notification on "cpuset.cpus.partition" when
 a partition become invalid.

 Patch 5: Clarify the use of invalid partition root and add new checks
 to make sure that normal cpuset control file operations will not be
 allowed to create invalid partition root. It also fixes some of the
 issues in existing code.

 Patch 6: Add a new partition state "isolated" to create a partition
 root without load balancing. This is for handling intermitten workloads
 that have a strict low latency requirement.

 Patch 7: Allow partition roots that are not the top cpuset to distribute
 all its cpus to child partitions as long as there is no task associated
 with that partition root. This allows more flexibility for middleware
 to manage multiple partitions.

Patch 8 updates the cgroup-v2.rst file accordingly. Patch 9 adds a new
cpuset test to test the new cpuset partition code.

Waiman Long (9):
  cgroup/cpuset: Miscellaneous code cleanup
  cgroup/cpuset: Fix a partition bug with hotplug
  cgroup/cpuset: Fix violation of cpuset locking rule
  cgroup/cpuset: Enable event notification when partition become invalid
  cgroup/cpuset: Clarify the use of invalid partition root
  cgroup/cpuset: Add a new isolated cpus.partition type
  cgroup/cpuset: Allow non-top parent partition root to distribute out
    all CPUs
  cgroup/cpuset: Update description of cpuset.cpus.partition in
    cgroup-v2.rst
  kselftest/cgroup: Add cpuset v2 partition root state test

 Documentation/admin-guide/cgroup-v2.rst       |  94 ++-
 kernel/cgroup/cpuset.c                        | 360 +++++++---
 tools/testing/selftests/cgroup/Makefile       |   5 +-
 .../selftests/cgroup/test_cpuset_prs.sh       | 626 ++++++++++++++++++
 tools/testing/selftests/cgroup/wait_inotify.c |  67 ++
 5 files changed, 1007 insertions(+), 145 deletions(-)
 create mode 100755 tools/testing/selftests/cgroup/test_cpuset_prs.sh
 create mode 100644 tools/testing/selftests/cgroup/wait_inotify.c

Comments

Tejun Heo July 26, 2021, 11:17 p.m. UTC | #1
Hello,

On Tue, Jul 20, 2021 at 10:18:25AM -0400, Waiman Long wrote:
> v3:
>  - Add two new patches (patches 2 & 3) to fix bugs found during the
>    testing process.
>  - Add a new patch to enable inotify event notification when partition
>    become invalid.
>  - Add a test to test event notification when partition become invalid.

I applied parts of the series. I think there was a bit of miscommunication.
I meant that we should use the invalid state as the only way to indicate
errors as long as the error state is something which can be reached through
hot unplug or other uncontrollable changes, and require users to monitor the
state transitions for confirmation and error handling.

Thanks.
Waiman Long July 27, 2021, 9:14 p.m. UTC | #2
On 7/26/21 7:17 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Jul 20, 2021 at 10:18:25AM -0400, Waiman Long wrote:
>> v3:
>>   - Add two new patches (patches 2 & 3) to fix bugs found during the
>>     testing process.
>>   - Add a new patch to enable inotify event notification when partition
>>     become invalid.
>>   - Add a test to test event notification when partition become invalid.
> I applied parts of the series. I think there was a bit of miscommunication.
> I meant that we should use the invalid state as the only way to indicate
> errors as long as the error state is something which can be reached through
> hot unplug or other uncontrollable changes, and require users to monitor the
> state transitions for confirmation and error handling.

Yes, that is the point of adding the event notification patch.

In the current code, direct write to cpuset.cpus.partition are strictly 
controlled and invalid transitions are rejected. However, changes to 
cpuset.cpus that do not break the cpu exclusivity rule or cpu hot plug 
may cause a partition to changed to invalid. What is currently done in 
this patchset is to add extra guards to reject those cpuset.cpus change 
that cause the partition to become invalid since changes that break cpu 
exclusivity rule will be rejected anyway. I can leave out those extra 
guards and allow those invalid cpuset.cpus change to go forward and 
change the partition to invalid instead if this is what you want.

However, if we have a complicated partition setup with multiple child 
partitions. Invalid cpuset.cpus change in a parent partition will cause 
all the child partitions to become invalid too. That is the scenario 
that I don't want to happen inadvertently. Alternatively, we can 
restrict those invalid changes if a child partition exist and let it 
pass through and make it invalid if it is a standalone partition.

Please let me know which approach do you want me to take.

Cheers,
Longman
Tejun Heo Aug. 9, 2021, 10:46 p.m. UTC | #3
Hello, Waiman. Sorry about the delay. Was off for a while.

On Tue, Jul 27, 2021 at 05:14:27PM -0400, Waiman Long wrote:
> However, if we have a complicated partition setup with multiple child
> partitions. Invalid cpuset.cpus change in a parent partition will cause all
> the child partitions to become invalid too. That is the scenario that I
> don't want to happen inadvertently. Alternatively, we can restrict those

I don't think there's anything fundamentally wrong with it given the
requirement that userland has to monitor invalid state transitions.
The same mass transition can happen through cpu hotplug operations,
right?

> invalid changes if a child partition exist and let it pass through and make
> it invalid if it is a standalone partition.
> 
> Please let me know which approach do you want me to take.

I think it'd be best if we can stick to some principles rather than
trying to adjust it for specific scenarios. e.g.:

* If a given state can be reached through cpu hot [un]plug, any
  configuration attempt which reaches the same state should be allowed
  with the same end result as cpu hot [un]plug.

* If a given state can't ever be reached in whichever way, the
  configuration attempting to reach such state should be rejected.

Thanks.
Waiman Long Aug. 10, 2021, 1:12 a.m. UTC | #4
On 8/9/21 6:46 PM, Tejun Heo wrote:
> Hello, Waiman. Sorry about the delay. Was off for a while.
>
> On Tue, Jul 27, 2021 at 05:14:27PM -0400, Waiman Long wrote:
>> However, if we have a complicated partition setup with multiple child
>> partitions. Invalid cpuset.cpus change in a parent partition will cause all
>> the child partitions to become invalid too. That is the scenario that I
>> don't want to happen inadvertently. Alternatively, we can restrict those
> I don't think there's anything fundamentally wrong with it given the
> requirement that userland has to monitor invalid state transitions.
> The same mass transition can happen through cpu hotplug operations,
> right?
>
>> invalid changes if a child partition exist and let it pass through and make
>> it invalid if it is a standalone partition.
>>
>> Please let me know which approach do you want me to take.
> I think it'd be best if we can stick to some principles rather than
> trying to adjust it for specific scenarios. e.g.:
>
> * If a given state can be reached through cpu hot [un]plug, any
>    configuration attempt which reaches the same state should be allowed
>    with the same end result as cpu hot [un]plug.
>
> * If a given state can't ever be reached in whichever way, the
>    configuration attempting to reach such state should be rejected.

OK, I got it. I will make the necessary changes and submit a new patch 
series.

Thanks,
Longman