Message ID | 168842897573.139194.15893960758088950748.stgit@manet.1015granger.net (mailing list archive) |
---|---|
Headers | show |
Series | SUNRPC service thread scheduler optimizations | expand |
I've been pondering this scheduling mechanism in sunrpc/svc some more, and I wonder if rather than optimising the search, we should eliminate it. Instead we could have a linked list of idle threads using llist.h svc_enqueue_xprt calls llist_del_first() and if the result is not NULL, that thread is deemed busy (because it isn't on the list) and is woken. choose_victim() could also use llist_del_first(). If nothing is there it could set a flag which gets cleared by the next thread to go idle. That thread exits .. or something. Some interlock would be needed but it shouldn't be too hard. svc_exit_thread would have difficulty removing itself from the idle list, if it wasn't busy.. Possibly we could disallow that case (I think sending a signal to a thread can make it exit while idle). Alternately we could use llist_del_all() to clear the list, then wake them all up so that they go back on the list if there is nothing to do and if they aren't trying to exit. That is fairly heavy handed, but isn't a case that we need to optimise. If you think that might be worth pursuing, I could have a go at writing the patch - probably on top of all the cleanups in your series before the xarray is added. I also wonder if we should avoid waking too many threads up at once. If multiple events happen in quick succession, we currently wake up multiple threads and if there is any scheduling delay (which is expected based on Commit 22700f3c6df5 ("SUNRPC: Improve ordering of transport processing")) then by the time the threads wake up, there may no longer be work to do as another thread might have gone idle and taken the work. Instead we could have a limit on the number of threads waking up - possibly 1 or 3. If the counter is maxed out, don't do a wake up. When a thread wakes up, it decrements the counter, dequeues some work, and if there is more to do, then it queues another task. I imagine the same basic protocol would be used for creating new threads when load is high - start just one at a time, though maybe a new thread would handle a first request before possibly starting another thread. But this is a stretch goal - not the main focus. Thanks, NeilBrown
On Wed, Jul 05, 2023 at 11:08:32AM +1000, NeilBrown wrote: > > I've been pondering this scheduling mechanism in sunrpc/svc some more, > and I wonder if rather than optimising the search, we should eliminate > it. > > Instead we could have a linked list of idle threads using llist.h > > svc_enqueue_xprt calls llist_del_first() and if the result is not NULL, > that thread is deemed busy (because it isn't on the list) and is woken. > > choose_victim() could also use llist_del_first(). If nothing is there > it could set a flag which gets cleared by the next thread to go idle. > That thread exits .. or something. Some interlock would be needed but > it shouldn't be too hard. > > svc_exit_thread would have difficulty removing itself from the idle > list, if it wasn't busy.. Possibly we could disallow that case (I think > sending a signal to a thread can make it exit while idle). > Alternately we could use llist_del_all() to clear the list, then wake > them all up so that they go back on the list if there is nothing to do > and if they aren't trying to exit. That is fairly heavy handed, but > isn't a case that we need to optimise. > > If you think that might be worth pursuing, I could have a go at writing > the patch - probably on top of all the cleanups in your series before > the xarray is added. The thread pool is effectively a cached resource, so it is a use case that fits llist well. svcrdma uses llist in a few spots in that very capacity. If you think you can meet all of the criteria in the table at the top of llist.h so thread scheduling works entirely without a lock, that might be an interesting point of comparison. My only concern is that the current set of candidate mechanisms manage to use mostly the first thread, rather than round-robining through the thread list. Using mostly one process tends to be more cache-friendly. An llist-based thread scheduler should try to follow that behavior, IMO. > I also wonder if we should avoid waking too many threads up at once. > If multiple events happen in quick succession, we currently wake up > multiple threads and if there is any scheduling delay (which is expected > based on Commit 22700f3c6df5 ("SUNRPC: Improve ordering of transport processing")) > then by the time the threads wake up, there may no longer be work to do > as another thread might have gone idle and taken the work. It might be valuable to add some observability of wake-ups that find nothing to do. I'll look into that. > Instead we could have a limit on the number of threads waking up - > possibly 1 or 3. If the counter is maxed out, don't do a wake up. > When a thread wakes up, it decrements the counter, dequeues some work, > and if there is more to do, then it queues another task. I consider reducing the wake-up rate as the next step for improving RPC service thread scalability. Any experimentation in that area is worth looking into. > I imagine the same basic protocol would be used for creating new threads > when load is high - start just one at a time, though maybe a new thread > would handle a first request before possibly starting another thread. I envision that dynamically tuning the pool thread count as something that should be managed in user space, since it's a policy rather than a mechanism. That could be a problem, though, if we wanted to shut down a few pool threads based on shrinker activity.