Message ID | 20201124155039.13804-10-will@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | An alternative series for asymmetric AArch32 systems | expand |
On 11/24/20 15:50, Will Deacon wrote: > If the scheduler cannot find an allowed CPU for a task, > cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask > if cgroup v1 is in use. > > In preparation for allowing architectures to provide their own fallback > mask, just return early if we're not using cgroup v2 and allow > select_fallback_rq() to figure out the mask by itself. What about cpuset_attach()? When a task attaches to a new group its affinity could be reset to possible_cpu_mask if it moves to top_cpuset or the intersection of effective_cpus & cpu_online_mask. Probably handled with later patches. /me continue to look at the rest Okay so in patch 11 we make set_cpus_allowed_ptr() fail. cpuset_attach will just do WARN_ON_ONCE() and carry on. I think we can fix that by making sure cpuset_can_attach() will fail. Which can be done by fixing task_can_attach() to take into account the new arch task_cpu_possible_mask()? Thanks -- Qais Yousef > > Cc: Li Zefan <lizefan@huawei.com> > Cc: Tejun Heo <tj@kernel.org> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Reviewed-by: Quentin Perret <qperret@google.com> > Signed-off-by: Will Deacon <will@kernel.org> > --- > kernel/cgroup/cpuset.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 57b5b5d0a5fd..e970737c3ed2 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -3299,9 +3299,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask) > > void cpuset_cpus_allowed_fallback(struct task_struct *tsk) > { > + if (!is_in_v2_mode()) > + return; /* select_fallback_rq will try harder */ > + > rcu_read_lock(); > - do_set_cpus_allowed(tsk, is_in_v2_mode() ? > - task_cs(tsk)->cpus_allowed : cpu_possible_mask); > + do_set_cpus_allowed(tsk, task_cs(tsk)->cpus_allowed); > rcu_read_unlock(); > > /* > -- > 2.29.2.454.gaff20da3a2-goog >
On 11/27/20 13:32, Qais Yousef wrote: > On 11/24/20 15:50, Will Deacon wrote: > > If the scheduler cannot find an allowed CPU for a task, > > cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask > > if cgroup v1 is in use. > > > > In preparation for allowing architectures to provide their own fallback > > mask, just return early if we're not using cgroup v2 and allow > > select_fallback_rq() to figure out the mask by itself. > > What about cpuset_attach()? When a task attaches to a new group its affinity > could be reset to possible_cpu_mask if it moves to top_cpuset or the > intersection of effective_cpus & cpu_online_mask. > > Probably handled with later patches. > > /me continue to look at the rest > > Okay so in patch 11 we make set_cpus_allowed_ptr() fail. cpuset_attach will > just do WARN_ON_ONCE() and carry on. > > I think we can fix that by making sure cpuset_can_attach() will fail. Which can > be done by fixing task_can_attach() to take into account the new arch > task_cpu_possible_mask()? I ran some experiments and indeed we have some problems here :( I create 3 cpusets: 64bit, 32bit and mix. As the name indicates, 64bit contains all 64bit-only cpus, 32bit contains 32bit-capable ones and mix has a mixture of both. If I try to move my test binary to 64bit cpuset, it moves there and I see the WARN_ON_ONCE() triggered. The task has attached to the new cpuset but set_allowed_cpus_ptr() has failed and we end up with whatever affinity we had previously. Breaking cpusets effectively. This can be easily fixable with my suggestion to educate task_can_attach() to take into account the new arch_task_cpu_cpu_possible_mask(). The attachment will fail and userspace will get an error then. But this introduces another issue.. If I use cpumask_subset() in task_can_attach(), then an attempt to move to 'mix' cpuset will fail. In Android foreground cpuset usually contains a mixture of cpus. So we want this to work. Also 'background' tasks are usually bound to littles. If the littles are 64bit only, then user space must ensure to add a 32bit capable cpu to the list. If we can't attach to mixed cpuset, then this workaround won't work and Android will have to create different cpusets for 32bit and 64bit apps. Which would be difficult since 64bit apps do embed 32bit binaries and it'd be hard for userspace to deal with that. Unless we extend execve syscall to take a cgroup as an arg to attach to (like clone3 syscall), then potentially we can do something about it, I think, by 'fixing' the libc wrapper. If I use cpumask_intersects() to validate the attachment, then the 64bit attachment will fail but the 'mix' one will succeed, as desired. BUT we'll re-introduce the WARN_ON_ONCE() again since __set_cpus_allowed_ptr_locked() validates against arch_task_cpu_possible_mask() with cpumask_subset() :-/ ie: cpuset_attach()::set_cpus_allowed_ptr() will fail with just a warning printed once when attaching to 'mix'. We can fix this by making __set_cpus_allowed_ptr_locked() perform cpumask_and() like restrict_cpus_allowed_ptr() does. Though this will not only truncate cpuset mask with the intersection of arch_task_cpu_possible_mask(), but also truncate it for sched_setaffinity() syscall. Something to keep in mind. Finally there's the ugly problem that I'm not sure how to address of modifying the cpuset.cpus of, for example 'mix' cpuset, such that we end up with 64bit only cpus. If the 32bit task has already attached, then we'll end up with a broken cpuset with a task attached having 'invalid' cpumask. We can teach cpuset.c:update_cpumask()::validate_change() to do something about it. By either walking the attached tasks and validating the new mask against the arch one. Or by storing the arch mask in struct cpuset. Though for the latter if we want to be truly generic, we need to ensure we handle the case of 2 tasks having different arch_task_cpu_possible_mask(). Long email. Hopefully it makes sense! I can share my test script, binary/code and fixup patches if you like. Thanks -- Qais Yousef
On Monday 30 Nov 2020 at 17:05:31 (+0000), Qais Yousef wrote: > I create 3 cpusets: 64bit, 32bit and mix. As the name indicates, 64bit contains > all 64bit-only cpus, 32bit contains 32bit-capable ones and mix has a mixture of > both. > > If I try to move my test binary to 64bit cpuset, it moves there and I see the > WARN_ON_ONCE() triggered. The task has attached to the new cpuset but > set_allowed_cpus_ptr() has failed and we end up with whatever affinity we had > previously. Breaking cpusets effectively. Right, and so does exec'ing from a 64 bit task into 32 bit executable from within a 64 bit-only cpuset :( . And there is nothing we can really do about it, we cannot fail the exec because we need this to work for existing apps, and there is no way the Android framework can know upfront. So the only thing we can do really is WARN() and proceed to ignore the cpuset, which is what this series does :/. It's not exactly pretty but I don't think we can do much better than that TBH, and it's the same thing for the example you brought up. Failing cpuset_can_attach() will not help, we can only WARN and proceed ... Now, Android should be fine with that I think. We only need the kernel to implement a safe fallback mechanism when userspace gives contradictory commands, because we know there are edge cases userspace _cannot_ deal with correctly, but this fallback doesn't need to be highly optimized (at least for Android), but I'm happy to hear what others think. Thanks, Quentin
On 11/30/20 17:36, Quentin Perret wrote: > On Monday 30 Nov 2020 at 17:05:31 (+0000), Qais Yousef wrote: > > I create 3 cpusets: 64bit, 32bit and mix. As the name indicates, 64bit contains > > all 64bit-only cpus, 32bit contains 32bit-capable ones and mix has a mixture of > > both. > > > > If I try to move my test binary to 64bit cpuset, it moves there and I see the > > WARN_ON_ONCE() triggered. The task has attached to the new cpuset but > > set_allowed_cpus_ptr() has failed and we end up with whatever affinity we had > > previously. Breaking cpusets effectively. > > Right, and so does exec'ing from a 64 bit task into 32 bit executable > from within a 64 bit-only cpuset :( . And there is nothing we can really True. The kernel can decide to kill the task or force detach it then, no? Sending SIGKILL makes more sense. > do about it, we cannot fail the exec because we need this to work for > existing apps, and there is no way the Android framework can know > upfront. It knows upfront it has enabled asym aarch32. So it needs to make sure not to create 'invalid' cpusets? > > So the only thing we can do really is WARN() and proceed to ignore the > cpuset, which is what this series does :/. It's not exactly pretty but I > don't think we can do much better than that TBH, and it's the same thing > for the example you brought up. Failing cpuset_can_attach() will not > help, we can only WARN and proceed ... I think for cases where we can prevent userspace from doing something wrong, we should. Like trying to attach to a cpuset that will result in an empty mask. FWIW, it does something similar with deadline tasks. See task_can_attach(). Similarly for the case when userspace tries to modify the cpuset.cpus such that a task will end up with empty cpumask. We now have the new case that some tasks can only run on a subset of cpu_possible_mask. So the definition of empty cpumask has gained an extra meaning. > > Now, Android should be fine with that I think. We only need the kernel > to implement a safe fallback mechanism when userspace gives > contradictory commands, because we know there are edge cases userspace > _cannot_ deal with correctly, but this fallback doesn't need to be > highly optimized (at least for Android), but I'm happy to hear what > others think. Why not go with our original patch that fixes affinity then in the arch code if the task wakes up on the wrong cpu? It is much simpler approach IMO to achieve the same thing. I was under the impression that if we go down teaching the scheduler about asym ISA, then we have to deal with these edge cases. I don't think we're far away from getting there. Thanks -- Qais Yousef
On Tuesday 01 Dec 2020 at 11:58:42 (+0000), Qais Yousef wrote: > On 11/30/20 17:36, Quentin Perret wrote: > > On Monday 30 Nov 2020 at 17:05:31 (+0000), Qais Yousef wrote: > > > I create 3 cpusets: 64bit, 32bit and mix. As the name indicates, 64bit contains > > > all 64bit-only cpus, 32bit contains 32bit-capable ones and mix has a mixture of > > > both. > > > > > > If I try to move my test binary to 64bit cpuset, it moves there and I see the > > > WARN_ON_ONCE() triggered. The task has attached to the new cpuset but > > > set_allowed_cpus_ptr() has failed and we end up with whatever affinity we had > > > previously. Breaking cpusets effectively. > > > > Right, and so does exec'ing from a 64 bit task into 32 bit executable > > from within a 64 bit-only cpuset :( . And there is nothing we can really > > True. The kernel can decide to kill the task or force detach it then, no? > Sending SIGKILL makes more sense. Yeah but again, we need this to work for existing apps. Just killing it basically means we have no support, so that doesn't work for the use case :/ > > do about it, we cannot fail the exec because we need this to work for > > existing apps, and there is no way the Android framework can know > > upfront. > > It knows upfront it has enabled asym aarch32. So it needs to make sure not to > create 'invalid' cpusets? Problem is, we _really_ don't want to keep a big CPU in the background cpuset just for that. And even if we did, we'd have to deal with hotplug. > > > > So the only thing we can do really is WARN() and proceed to ignore the > > cpuset, which is what this series does :/. It's not exactly pretty but I > > don't think we can do much better than that TBH, and it's the same thing > > for the example you brought up. Failing cpuset_can_attach() will not > > help, we can only WARN and proceed ... > > I think for cases where we can prevent userspace from doing something wrong, we > should. Like trying to attach to a cpuset that will result in an empty mask. > FWIW, it does something similar with deadline tasks. See task_can_attach(). > > Similarly for the case when userspace tries to modify the cpuset.cpus such that > a task will end up with empty cpumask. We now have the new case that some tasks > can only run on a subset of cpu_possible_mask. So the definition of empty > cpumask has gained an extra meaning. I see this differently, e.g. if you affine a task to a CPU and you hotunplug it, then the kernel falls back to the remaining online CPUs for that task. Not pretty, but it keeps things functional. I'm thinking a similar kind of support would be good enough here. But yes, this cpuset mess is the part of the series I hate most too. It's just not clear we have better solutions :/ > > > > Now, Android should be fine with that I think. We only need the kernel > > to implement a safe fallback mechanism when userspace gives > > contradictory commands, because we know there are edge cases userspace > > _cannot_ deal with correctly, but this fallback doesn't need to be > > highly optimized (at least for Android), but I'm happy to hear what > > others think. > > Why not go with our original patch that fixes affinity then in the arch code if > the task wakes up on the wrong cpu? It is much simpler approach IMO to achieve > the same thing. I personally had no issues with that patch, but as per Peter's original reply, that's "not going to happen". Will's proposal seems to go one step further and tries its best to honor the contract with userspace (by keeping the subset of the affinity mask, ...) when that can be done, so if that can be acceptable, then be it. But I'd still rather keep this simple if at all possible. It's just my opinion though :) Thanks, Quentin
On 12/01/20 12:37, Quentin Perret wrote: > On Tuesday 01 Dec 2020 at 11:58:42 (+0000), Qais Yousef wrote: > > On 11/30/20 17:36, Quentin Perret wrote: > > > On Monday 30 Nov 2020 at 17:05:31 (+0000), Qais Yousef wrote: > > > > I create 3 cpusets: 64bit, 32bit and mix. As the name indicates, 64bit contains > > > > all 64bit-only cpus, 32bit contains 32bit-capable ones and mix has a mixture of > > > > both. > > > > > > > > If I try to move my test binary to 64bit cpuset, it moves there and I see the > > > > WARN_ON_ONCE() triggered. The task has attached to the new cpuset but > > > > set_allowed_cpus_ptr() has failed and we end up with whatever affinity we had > > > > previously. Breaking cpusets effectively. > > > > > > Right, and so does exec'ing from a 64 bit task into 32 bit executable > > > from within a 64 bit-only cpuset :( . And there is nothing we can really > > > > True. The kernel can decide to kill the task or force detach it then, no? > > Sending SIGKILL makes more sense. > > Yeah but again, we need this to work for existing apps. Just killing it > basically means we have no support, so that doesn't work for the use > case :/ I don't get you. Unless what you're saying is you want to support this without userspace having to do anything to opt-in, then this statement is hard to digest for me. Existing apps will work. OEMs and vendors who want to enable this feature must configure their userspace to deal with this correctly. This includes passing the cmdline option, and parsing the cpumask in /sys/device/system/cpu/aarch32_el0 to ensure their cpusets contains at least one cpu from this list. > > > > do about it, we cannot fail the exec because we need this to work for > > > existing apps, and there is no way the Android framework can know > > > upfront. > > > > It knows upfront it has enabled asym aarch32. So it needs to make sure not to > > create 'invalid' cpusets? > > Problem is, we _really_ don't want to keep a big CPU in the background I don't get you here too. AFAIU, OEMs have to define their cpusets. So it makes sense to me for them to define it correctly if they want to enable asym aarch32. Systems that don't care about this feature shouldn't be affected. If they do, then I'm missing something. > cpuset just for that. And even if we did, we'd have to deal with hotplug. We deal with hotplug by not allowing one of the aarch32 cpus from going offline. > > > > > > > So the only thing we can do really is WARN() and proceed to ignore the > > > cpuset, which is what this series does :/. It's not exactly pretty but I > > > don't think we can do much better than that TBH, and it's the same thing > > > for the example you brought up. Failing cpuset_can_attach() will not > > > help, we can only WARN and proceed ... > > > > I think for cases where we can prevent userspace from doing something wrong, we > > should. Like trying to attach to a cpuset that will result in an empty mask. > > FWIW, it does something similar with deadline tasks. See task_can_attach(). > > > > Similarly for the case when userspace tries to modify the cpuset.cpus such that > > a task will end up with empty cpumask. We now have the new case that some tasks > > can only run on a subset of cpu_possible_mask. So the definition of empty > > cpumask has gained an extra meaning. > > I see this differently, e.g. if you affine a task to a CPU and you > hotunplug it, then the kernel falls back to the remaining online CPUs > for that task. Not pretty, but it keeps things functional. I'm thinking > a similar kind of support would be good enough here. Sorry I can't see it this way :( The comparison is fundamentally different. Being able to attach to a cpuset, or modify its cpus is different than short circuiting a cpu out of the system. For hotplug we have to make sure a single cpu stays alive. The fallback you're talking about should still work the same if the task is not attached to a cpuset. Just it has to take the intersection with the arch_task_cpu_possible_cpu() into account. For cpusets, if hotunplug results in an empty cpuset, then all tasks are moved to the nearest ancestor if I read the code correctly. In our case, only 32bit tasks have to move out to retain this behavior. Since now for the first time we have tasks that can't run on all cpus. Which by the way might be the right behavior for 64bit tasks execing 32bit binary in a 64bit only cpuset. I suggested SIGKILL'ing them but maybe moving them to the nearest ancestor too is more aligned with the behavior above. > > But yes, this cpuset mess is the part of the series I hate most too. > It's just not clear we have better solutions :/ > > > > > > > Now, Android should be fine with that I think. We only need the kernel > > > to implement a safe fallback mechanism when userspace gives > > > contradictory commands, because we know there are edge cases userspace > > > _cannot_ deal with correctly, but this fallback doesn't need to be > > > highly optimized (at least for Android), but I'm happy to hear what > > > others think. > > > > Why not go with our original patch that fixes affinity then in the arch code if > > the task wakes up on the wrong cpu? It is much simpler approach IMO to achieve > > the same thing. > > I personally had no issues with that patch, but as per Peter's original > reply, that's "not going to happen". Will's proposal seems to go one > step further and tries its best to honor the contract with userspace (by The only difference I see is that this series blocks sched_setaffinity(). Other than that we fix up the task affinity in a similar way, just in different places. If the intersection was empty, we sent a SIGKILL; making it fallback to something sensible is dead easy. I opted for SIGKILL because I didn't expect Peter to agree with the fallback. But if we're arguing it's okay now, then it should be okay in that patch too then. > keeping the subset of the affinity mask, ...) when that can be done, so > if that can be acceptable, then be it. But I'd still rather keep this > simple if at all possible. It's just my opinion though :) The way I see it, this series adds support to handle generic asym ISA. Asym AArch32 will be the first and only user for now, but it's paving the way for others. It's up to the maintainers to decide, but IMHO if we'll do this via enabling scheduler to deal with generic asym ISA support, we should deal with these corner cases. Otherwise a specific solution for the current problem at hand in the arch code makes more sense to me. It all depends how much the community wants/is willing to cater for these systems in the future too. Thanks -- Qais Yousef
On Tuesday 01 Dec 2020 at 14:11:21 (+0000), Qais Yousef wrote: > AFAIU, OEMs have to define their cpusets. So it makes sense to me for them to > define it correctly if they want to enable asym aarch32. > > Systems that don't care about this feature shouldn't be affected. If they do, > then I'm missing something. Right, but there are 2 cases for 32 bit tasks in Android: 1. 32 bit apps; these are not an issue, the Android framework knows about them and it's fine to expect it to setup cpusets accordingly IMO. 2. 64 bit apps that also happen to have a 32 bit binary payload, and exec into it. The Android framework has no visibility over that, all it sees is a 64 bit app. Sadly we can't detect this stupid pattern, but we need these to remain somewhat functional. I was only talking about 2. the whole time, sorry if that wasn't clear. With that said, see below for the discussion about cpuset/hotplug. > We deal with hotplug by not allowing one of the aarch32 cpus from going > offline. Sure, but that would only work if we have that 32 bit CPU present in _all_ cpusets, no? What I'd like to avoid is to keep a (big) 32 bit CPU in the background cpuset of 64 bit tasks. That would make that big CPU available to _all_ 64 bit apps in the background, whether they need 32 bit support or not, because again we cannot distinguish them. And yeah, I expect this to be not go down well in practice. So, if we're going to support this, a requirement for Android is that some cpusets will be 64 bit only, and it's possible that we'll exec into 32 bit from within these cpusets. It's an edge case, we don't really want to optimize for it, but it needs to not fall apart completely. I'm not fundamentally against doing smarter things at all, I'm saying we (Android) just don't _need_ smarter things ATM, so we may want to keep it simple. My point in the previous message is, if we're accepting this for exec, a logical next step could be to accept it for cpuset migrations too. Failing the cgroup migration is hard since: there is no guarantee the source cpuset has 32 bit CPUs anyway (assuming the exec'd task is kept in the same cpuset), so why bother; userspace just doesn't know there are 32 bit tasks in an app and would keep trying to migrate it to 64 bit cpuset over and over again; you could end up with apps being stuck halfway through a top-app->background transition where some tasks have migrated but not others, ... It's a bit of a mess :/ <snip> > For hotplug we have to make sure a single cpu stays alive. The fallback you're > talking about should still work the same if the task is not attached to > a cpuset. Just it has to take the intersection with the > arch_task_cpu_possible_cpu() into account. Yep, agreed, there's probably room for improvement there. > For cpusets, if hotunplug results in an empty cpuset, then all tasks are moved > to the nearest ancestor if I read the code correctly. In our case, only 32bit > tasks have to move out to retain this behavior. Since now for the first time we > have tasks that can't run on all cpus. > > Which by the way might be the right behavior for 64bit tasks execing 32bit > binary in a 64bit only cpuset. I suggested SIGKILL'ing them but maybe moving > them to the nearest ancestor too is more aligned with the behavior above. Hmm, I guess that means putting all 32-bit-execd-from-64-bit tasks in the root group in Android. I'll try and check the implications, but that might be just fine... Sounds like a sensible behaviour to me anyways. Thanks, Quentin
On Tue, Dec 01, 2020 at 03:56:49PM +0000, Quentin Perret wrote: > On Tuesday 01 Dec 2020 at 14:11:21 (+0000), Qais Yousef wrote: > > For cpusets, if hotunplug results in an empty cpuset, then all tasks are moved > > to the nearest ancestor if I read the code correctly. In our case, only 32bit > > tasks have to move out to retain this behavior. Since now for the first time we > > have tasks that can't run on all cpus. > > > > Which by the way might be the right behavior for 64bit tasks execing 32bit > > binary in a 64bit only cpuset. I suggested SIGKILL'ing them but maybe moving > > them to the nearest ancestor too is more aligned with the behavior above. > > Hmm, I guess that means putting all 32-bit-execd-from-64-bit tasks in > the root group in Android. I'll try and check the implications, but that > might be just fine... Sounds like a sensible behaviour to me anyways. I'll look into this -- anything we can do to avoid forcefully resetting the affinity mask to the arch_task_cpu_possible_mask() is worth considering. Will
On 12/01/20 15:56, Quentin Perret wrote: > On Tuesday 01 Dec 2020 at 14:11:21 (+0000), Qais Yousef wrote: > > AFAIU, OEMs have to define their cpusets. So it makes sense to me for them to > > define it correctly if they want to enable asym aarch32. > > > > Systems that don't care about this feature shouldn't be affected. If they do, > > then I'm missing something. > > Right, but there are 2 cases for 32 bit tasks in Android: > > 1. 32 bit apps; these are not an issue, the Android framework knows > about them and it's fine to expect it to setup cpusets accordingly > IMO. > > 2. 64 bit apps that also happen to have a 32 bit binary payload, and > exec into it. The Android framework has no visibility over that, > all it sees is a 64 bit app. Sadly we can't detect this stupid > pattern, but we need these to remain somewhat functional. > > I was only talking about 2. the whole time, sorry if that wasn't clear. > With that said, see below for the discussion about cpuset/hotplug. Yep, I was referring to 2 too. I found out about the app that embeds the 32 bit binary, it was our major concern if we go with user space managing affinities. > > > We deal with hotplug by not allowing one of the aarch32 cpus from going > > offline. > > Sure, but that would only work if we have that 32 bit CPU present in > _all_ cpusets, no? What I'd like to avoid is to keep a (big) 32 > bit CPU in the background cpuset of 64 bit tasks. That would make that > big CPU available to _all_ 64 bit apps in the background, whether they > need 32 bit support or not, because again we cannot distinguish them. > And yeah, I expect this to be not go down well in practice. > > > So, if we're going to support this, a requirement for Android is that > some cpusets will be 64 bit only, and it's possible that we'll exec into > 32 bit from within these cpusets. It's an edge case, we don't really > want to optimize for it, but it needs to not fall apart completely. > I'm not fundamentally against doing smarter things at all, I'm saying we > (Android) just don't _need_ smarter things ATM, so we may want to keep > it simple. Fair enough. But in that case I find it neater to fix the affinities up in the arch code as a specific solution. I'm not seeing there's a difference in the end results between the two implementations if we don't address these issues :( > > My point in the previous message is, if we're accepting this for exec, > a logical next step could be to accept it for cpuset migrations too. > Failing the cgroup migration is hard since: there is no guarantee the > source cpuset has 32 bit CPUs anyway (assuming the exec'd task is kept > in the same cpuset), so why bother; userspace just doesn't know there > are 32 bit tasks in an app and would keep trying to migrate it to 64 bit > cpuset over and over again; you could end up with apps being stuck > halfway through a top-app->background transition where some tasks have > migrated but not others, ... > > It's a bit of a mess :/ It is. I think I addressed these concerns in other parts from my previous email. Judging by your reply below I think you see what I was talking about and we're more on the same page now :-) > > > <snip> > > For hotplug we have to make sure a single cpu stays alive. The fallback you're > > talking about should still work the same if the task is not attached to > > a cpuset. Just it has to take the intersection with the > > arch_task_cpu_possible_cpu() into account. > > Yep, agreed, there's probably room for improvement there. > > > For cpusets, if hotunplug results in an empty cpuset, then all tasks are moved > > to the nearest ancestor if I read the code correctly. In our case, only 32bit > > tasks have to move out to retain this behavior. Since now for the first time we > > have tasks that can't run on all cpus. > > > > Which by the way might be the right behavior for 64bit tasks execing 32bit > > binary in a 64bit only cpuset. I suggested SIGKILL'ing them but maybe moving > > them to the nearest ancestor too is more aligned with the behavior above. > > Hmm, I guess that means putting all 32-bit-execd-from-64-bit tasks in > the root group in Android. I'll try and check the implications, but that > might be just fine... Sounds like a sensible behaviour to me anyways. It'd be only the compat tasks that will have to move to root group. And only for those minority of apps that embed a 32bit binary. I think the impact is minimum. And I think the behavior makes sense generically. Thanks! -- Qais Yousef
On 12/01/20 22:30, Will Deacon wrote: > On Tue, Dec 01, 2020 at 03:56:49PM +0000, Quentin Perret wrote: > > On Tuesday 01 Dec 2020 at 14:11:21 (+0000), Qais Yousef wrote: > > > For cpusets, if hotunplug results in an empty cpuset, then all tasks are moved > > > to the nearest ancestor if I read the code correctly. In our case, only 32bit > > > tasks have to move out to retain this behavior. Since now for the first time we > > > have tasks that can't run on all cpus. > > > > > > Which by the way might be the right behavior for 64bit tasks execing 32bit > > > binary in a 64bit only cpuset. I suggested SIGKILL'ing them but maybe moving > > > them to the nearest ancestor too is more aligned with the behavior above. > > > > Hmm, I guess that means putting all 32-bit-execd-from-64-bit tasks in > > the root group in Android. I'll try and check the implications, but that > > might be just fine... Sounds like a sensible behaviour to me anyways. > > I'll look into this -- anything we can do to avoid forcefully resetting the > affinity mask to the arch_task_cpu_possible_mask() is worth considering. Happy to lend a hand, just let me know. Thanks -- Qais Yousef
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 57b5b5d0a5fd..e970737c3ed2 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3299,9 +3299,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask) void cpuset_cpus_allowed_fallback(struct task_struct *tsk) { + if (!is_in_v2_mode()) + return; /* select_fallback_rq will try harder */ + rcu_read_lock(); - do_set_cpus_allowed(tsk, is_in_v2_mode() ? - task_cs(tsk)->cpus_allowed : cpu_possible_mask); + do_set_cpus_allowed(tsk, task_cs(tsk)->cpus_allowed); rcu_read_unlock(); /*