Message ID | 1433516477-5153-8-git-send-email-pmladek@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello, Petr. On Fri, Jun 05, 2015 at 05:01:06PM +0200, Petr Mladek wrote: > Many kthreads already calls set_freezable() before they enter the main > cycle. One of the reasons for creating iterant kthreads is to create > a safe point for freezing and make even more kthreads properly > freezable. Therefore it would make sense to set all iterant > kthreads freezable by default. Actually, for most cases, making kthreads freezable is unnecessary and often indicative of something going wrong. This is a crude mechanism which goes along the line of "if all threads are stopped, the machine should be safe to be put into whatever state", which isn't true at all as there usually are a lot of stuff going on asynchronously especially when interacting with hardware. In most cases, we want to implement proper power management callbacks which plug new issuance of whatever work-unit the code is dealing with and drain in-flight ones. Whether the worker threads are frozen or not doesn't matter once that's implemented. It seems that people have been marking kthreads freezable w/o really thinking about it - some of them are subtly broken due to missing drainage of in-flight things while others simply don't need freezing for correctness. We do want to clean up freezer usage in the kernel but definitely do not want to make kthreads freezable by default especially given that the freezer essentially is one giant lockdep-less system-wide lock. > However some kthreads might be hard to make properly freezable. > For example, if they do non-interruptible sleeps. They would > need to explicitly clear PF_NOFREEZE flag in the init() call. > But it should be avoided whenever possible. So, big no here. Thanks.
On Tue 2015-06-09 16:20:03, Tejun Heo wrote: > Hello, Petr. > > On Fri, Jun 05, 2015 at 05:01:06PM +0200, Petr Mladek wrote: > > Many kthreads already calls set_freezable() before they enter the main > > cycle. One of the reasons for creating iterant kthreads is to create > > a safe point for freezing and make even more kthreads properly > > freezable. Therefore it would make sense to set all iterant > > kthreads freezable by default. > > Actually, for most cases, making kthreads freezable is unnecessary and > often indicative of something going wrong. This is a crude mechanism > which goes along the line of "if all threads are stopped, the machine > should be safe to be put into whatever state", which isn't true at all > as there usually are a lot of stuff going on asynchronously especially > when interacting with hardware. I think that the interaction with the hardware should be the reason to make them properly freezable. In the current state they are stopped at some random place, they might either miss some event from the hardware or the hardware might get resumed into another state and the kthread might wait forever. Also I think that freezing kthreads on some well defined location should help with reproducing and fixing problems. Note that _only_ kthreads using the _new API_ should be freezable by default. The move need to be done carefully. It is chance to review and clean up the freezer stuff. Of course, I might be too optimistic here. > In most cases, we want to implement proper power management callbacks > which plug new issuance of whatever work-unit the code is dealing with > and drain in-flight ones. Whether the worker threads are frozen or > not doesn't matter once that's implemented. I see. The power management is important here. > It seems that people have been marking kthreads freezable w/o really > thinking about it - some of them are subtly broken due to missing > drainage of in-flight things while others simply don't need freezing > for correctness. I have similar feeling. > We do want to clean up freezer usage in the kernel but definitely do > not want to make kthreads freezable by default especially given that > the freezer essentially is one giant lockdep-less system-wide lock. I think that we both want to clean up freezing. I would like to make it more deterministic and you suggest to make it more relaxed. Do I understand it correctly? Best Regards, Petr PS: Thanks a lot everyone for feedback. I try hard to get it somehow sorted and more confident about the conclusions. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Petr. On Tue, Jun 09, 2015 at 05:53:13PM +0200, Petr Mladek wrote: > I think that the interaction with the hardware should be the reason to > make them properly freezable. In the current state they are stopped at > some random place, they might either miss some event from the hardware > or the hardware might get resumed into another state and the kthread > might wait forever. Yes, IIUC, there are two classes of use cases where freezing kthreads makes sense. * While hibernating, freezing writeback workers and whoever else which might issue IOs. This is because we have to thaw devices to issue IOs to write out the prepared hibernation image. If new IOs are issued from page cache or whereever when the devices are thawed for writing out the hibernation image, the hibernation image and the data on the disk deviate. Note that this only works because currently none of the block drivers which may be used to write out hibernation images depend on freezable kthreads and hopefully nobody issues IOs from unfreezable kthreads or bh or softirq, which, I think, can happen with preallocated requests or bio based devices. This is a very brittle approach. Instead of controlling things where it actually can be controlled - the IO ingress point - we're trying to control all its users with a pretty blunt tool while at the same time depending on the fact that some of the low level subsystems don't happen to make use of the said blunt tool. I think what we should do is simply blocking all non-image IOs from the block layer while writing out hibernation image. It'd be simpler and a lot more reliable. * Device drivers which interact with their devices in a fully synchronous manner so that blocking the kthread always reliably quiesces the device. For this to be true, the device shouldn't carry over any state across the freezing points. There are drivers which are actually like this and it works for them. However, my impression is that people don't really scrutinize why freezer works for a specific case and tend to spray them all over and develop a fuzzy sense that freezing will somehow magically make the driver ready for PM operatoins. It doesn't help that freezer is such a blunt tool and our historical usage has always been fuzzy - in the earlier days, we depended on freezer even when we knew it wasn't sufficient probably because updating all subsystems and drivers were such a huge undertaking and freezer kinda sorta works in many cases. IMHO we'd be a lot better served by blocking the kthread from PM callbacks explicitly for these drivers to unify control points for PM operations and make what's going on a lot more explicit. This will surely involve a bit more boilerplate code but with the right helpers it should be mostly trivial and I believe that it's likely to encourage higher quality PM operations why getting rid of this fuzzy feeling around the freezer. For both cases, I don't really think kthread freezer is a good solution. This is a blunt enough tool to hide problems in most but not all cases while unnecessarily obscuring what's going on from developers. I personally strongly think that we eventually need to get rid of the kernel part of freezer. > Also I think that freezing kthreads on some well defined location > should help with reproducing and fixing problems. > > Note that _only_ kthreads using the _new API_ should be freezable by > default. The move need to be done carefully. It is chance to review > and clean up the freezer stuff. > > Of course, I might be too optimistic here. I'm strongly against this. We really don't want to make it even fuzzier. There are finite things which need to be controlled for PM operations and I believe what we need to do is identifying them and implementing explicit and clear control mechanisms for them, not spreading this feel-good mechanism even more, further obscuring where those points are. This becomes the game of "is it frozen ENOUGH yet?" and that "enough" is extremely difficult to determine as we're not looking at the choke spots at all and freezable kthreads only cover part of kernel activity. The fuzzy enoughness also actually inhibits development of proper mechanisms - "I believe this is frozen ENOUGH at this point so it is probably safe to assume that X, Y and Z aren't happening anymore" and it usually is except when it's not. Let's please not spread this even more. > > In most cases, we want to implement proper power management callbacks > > which plug new issuance of whatever work-unit the code is dealing with > > and drain in-flight ones. Whether the worker threads are frozen or > > not doesn't matter once that's implemented. > > I see. The power management is important here. That's the only reason we have freezer at all. > > It seems that people have been marking kthreads freezable w/o really > > thinking about it - some of them are subtly broken due to missing > > drainage of in-flight things while others simply don't need freezing > > for correctness. > > I have similar feeling. > > > We do want to clean up freezer usage in the kernel but definitely do > > not want to make kthreads freezable by default especially given that > > the freezer essentially is one giant lockdep-less system-wide lock. > > I think that we both want to clean up freezing. I would like to make > it more deterministic and you suggest to make it more relaxed. > Do I understand it correctly? I'm not sure I'm trying to make it more relaxed. I just don't want it to spread uncontrolled. Thanks a lot.
On Wed 2015-06-10 13:31:54, Tejun Heo wrote: > Hello, Petr. > > On Tue, Jun 09, 2015 at 05:53:13PM +0200, Petr Mladek wrote: > > I think that the interaction with the hardware should be the reason to > > make them properly freezable. In the current state they are stopped at > > some random place, they might either miss some event from the hardware > > or the hardware might get resumed into another state and the kthread > > might wait forever. > > Yes, IIUC, there are two classes of use cases where freezing kthreads > makes sense. First, thanks a lot for detailed explanation. > * While hibernating, freezing writeback workers and whoever else which > might issue IOs. This is because we have to thaw devices to issue > IOs to write out the prepared hibernation image. If new IOs are > issued from page cache or whereever when the devices are thawed for > writing out the hibernation image, the hibernation image and the > data on the disk deviate. Just an idea. I do not say that it is a good solution. If we already thaw devices needed to write the data. It should be possible to thaw also kthreads needed to write the data. > Note that this only works because currently none of the block > drivers which may be used to write out hibernation images depend on > freezable kthreads and hopefully nobody issues IOs from unfreezable > kthreads or bh or softirq, which, I think, can happen with > preallocated requests or bio based devices. It means that some kthreads must be stopped, some must be available, and we do not mind about the rest. I wonder which group is bigger. It might help to decide about the more safe default. It is not easy for me to decide :-/ > This is a very brittle approach. Instead of controlling things > where it actually can be controlled - the IO ingress point - we're > trying to control all its users with a pretty blunt tool while at > the same time depending on the fact that some of the low level > subsystems don't happen to make use of the said blunt tool. > > I think what we should do is simply blocking all non-image IOs from > the block layer while writing out hibernation image. It'd be > simpler and a lot more reliable. This sounds like an interesting idea to me. Do you already have some more precise plan in mind? Unfortunately, for me it is not easy to judge it. I wonder how many interfaces would need to get hacked to make this working. Also I wonder if they would be considered as a hack or a clean solution by the affected parties. By other words, I wonder if it is realistic. > * Device drivers which interact with their devices in a fully > synchronous manner so that blocking the kthread always reliably > quiesces the device. For this to be true, the device shouldn't > carry over any state across the freezing points. There are drivers > which are actually like this and it works for them. I am trying to sort it in my head. In each case, we need to stop these kthreads in some well defined state. Also the kthread (or device) must be able to block the freezing if they are not in the state yet. The stop points are defined by try_to_freeze() now. And freezable kthreads block the freezer until they reach these points. > However, my impression is that people don't really scrutinize why > freezer works for a specific case and tend to spray them all over > and develop a fuzzy sense that freezing will somehow magically make > the driver ready for PM operatoins. It doesn't help that freezer is > such a blunt tool and our historical usage has always been fuzzy - > in the earlier days, we depended on freezer even when we knew it > wasn't sufficient probably because updating all subsystems and > drivers were such a huge undertaking and freezer kinda sorta works > in many cases. I try to better understand why freezer is considered to be a blunt tool. Is it because it is a generic API, try_to_freeze() is put on "random" locations, so that it does not define the safe point precisely enough? > IMHO we'd be a lot better served by blocking the kthread from PM > callbacks explicitly for these drivers to unify control points for > PM operations and make what's going on a lot more explicit. This > will surely involve a bit more boilerplate code but with the right > helpers it should be mostly trivial and I believe that it's likely > to encourage higher quality PM operations why getting rid of this > fuzzy feeling around the freezer. I agree. There is needed some synchronization between the device drivers and kthread to make sure that they are on the same note. If I get this correctly. It works the following way now. PM notifies both kthreads (via freezer) and device drivers (via callbacks) about that the suspend is in progress. They are supposed to go into a sane state. But there is no explicit communication between the kthread and the driver. What do you exactly mean with the callbacks? Should they set some flags that would navigate the kthread into some state? > I'm strongly against this. We really don't want to make it even > fuzzier. There are finite things which need to be controlled for PM > operations and I believe what we need to do is identifying them and > implementing explicit and clear control mechanisms for them, not > spreading this feel-good mechanism even more, further obscuring where > those points are. I see. This explains why you are so strongly against changing the default. I see the following in the kernel sources: 61 set_freezable() 97 kthread_create() 259 kthread_run() I means that only about 17% kthreads are freezable these days. > This becomes the game of "is it frozen ENOUGH yet?" and that "enough" > is extremely difficult to determine as we're not looking at the choke > spots at all and freezable kthreads only cover part of kernel > activity. The fuzzy enoughness also actually inhibits development of > proper mechanisms - "I believe this is frozen ENOUGH at this point so > it is probably safe to assume that X, Y and Z aren't happening > anymore" and it usually is except when it's not. I am not sure what you mean by frozen enough? I think that a tasks is either frozen or not. Do I miss something, please? Best Regards, Petr -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey, Petr. On Fri, Jun 12, 2015 at 03:24:40PM +0200, Petr Mladek wrote: > > * While hibernating, freezing writeback workers and whoever else which > > might issue IOs. This is because we have to thaw devices to issue > > IOs to write out the prepared hibernation image. If new IOs are > > issued from page cache or whereever when the devices are thawed for > > writing out the hibernation image, the hibernation image and the > > data on the disk deviate. > > Just an idea. I do not say that it is a good solution. If we already > thaw devices needed to write the data. It should be possible to thaw > also kthreads needed to write the data. Sure, we'd end up having to mark the involved kthreads and whatever they may depend on with "freeze for snapshotting but not for writing out images". > > Note that this only works because currently none of the block > > drivers which may be used to write out hibernation images depend on > > freezable kthreads and hopefully nobody issues IOs from unfreezable > > kthreads or bh or softirq, which, I think, can happen with > > preallocated requests or bio based devices. > > It means that some kthreads must be stopped, some must be available, > and we do not mind about the rest. I wonder which group is bigger. > It might help to decide about the more safe default. It is not easy > for me to decide :-/ Please do not spread freezer more than necessary. It's not about which part is larger but about not completely losing track of what's going on. At least now we can say "this needs freezer because XYZ" and that XYZ actually points to something which needs to be synchronized for PM operations. If we flip the polarity, all we're left with is "this can't be frozen because it deadlocks with XYZ" which is a lot less information, both in terms of relevance and amount, than the other way around. > > This is a very brittle approach. Instead of controlling things > > where it actually can be controlled - the IO ingress point - we're > > trying to control all its users with a pretty blunt tool while at > > the same time depending on the fact that some of the low level > > subsystems don't happen to make use of the said blunt tool. > > > > I think what we should do is simply blocking all non-image IOs from > > the block layer while writing out hibernation image. It'd be > > simpler and a lot more reliable. > > This sounds like an interesting idea to me. Do you already have some > more precise plan in mind? Unfortunately, no. No concrete plans or patches yet but even then I'm pretty sure we don't wanna head the other direction. It's just wrong. > Unfortunately, for me it is not easy to judge it. I wonder how many > interfaces would need to get hacked to make this working. Not much really. We just need a way to flag IOs to write image - be that a bio / request flag or task flag and then a way to tell the block layer to block everything else. > Also I wonder if they would be considered as a hack or a clean > solution by the affected parties. By other words, I wonder if it is > realistic. I'd say it's pretty realistic. This is synchronizing where the operations need to be synchronized. What part would be hackish? > > * Device drivers which interact with their devices in a fully > > synchronous manner so that blocking the kthread always reliably > > quiesces the device. For this to be true, the device shouldn't > > carry over any state across the freezing points. There are drivers > > which are actually like this and it works for them. > > I am trying to sort it in my head. In each case, we need to stop these > kthreads in some well defined state. Also the kthread (or device) > must be able to block the freezing if they are not in the state yet. Whether kthreads need to be stopped or not is periphery once you have a proper blocking / draining mechanism. The thing is most of these kthreads are blocked on requests coming down from upper layers anyway. Once you plug queueing of new requests and drain whatever is in flight, the kthread isn't gonna do anything anyway. > The stop points are defined by try_to_freeze() now. And freezable > kthreads block the freezer until they reach these points. I'm repeating myself but that only works for fully synchronous devices (ie. kthread processing one command at a time). You need to drain whatever is in flight too. You can't do that with simple freezing points. There's a reason why a lot of drivers can't rely on freezer for PM operations. Drivers which can fully depend on freezer would be minority. > > However, my impression is that people don't really scrutinize why > > freezer works for a specific case and tend to spray them all over > > and develop a fuzzy sense that freezing will somehow magically make > > the driver ready for PM operatoins. It doesn't help that freezer is > > such a blunt tool and our historical usage has always been fuzzy - > > in the earlier days, we depended on freezer even when we knew it > > wasn't sufficient probably because updating all subsystems and > > drivers were such a huge undertaking and freezer kinda sorta works > > in many cases. > > I try to better understand why freezer is considered to be a blunt > tool. Is it because it is a generic API, try_to_freeze() is put on > "random" locations, so that it does not define the safe point > precisely enough? Not that. I don't know how to explain it better. Hmmm... okay, let's say there's a shared queue Q and N users o fit. If you wanna make Q empty and keep it that way for a while, the right thing to do is blocking new queueing and then wait till Q drains - you choke the entity that you wanna control. Instead of that, freezer is trying to block the "N" users part. In majority of cases, it blocks enough but it's pretty difficult to be sure whether you actually got all N of them (as some of them may not involve kthreads at all or unfreezable kthreads might end up doing those operations too on corner cases) and it's also not that clear whether blocking the N users actually make Q empty. Maybe there are things which can be in flight asynchronously on Q even all its N users are blocked. This is inherently finicky. > > IMHO we'd be a lot better served by blocking the kthread from PM > > callbacks explicitly for these drivers to unify control points for > > PM operations and make what's going on a lot more explicit. This > > will surely involve a bit more boilerplate code but with the right > > helpers it should be mostly trivial and I believe that it's likely > > to encourage higher quality PM operations why getting rid of this > > fuzzy feeling around the freezer. > > I agree. There is needed some synchronization between the device > drivers and kthread to make sure that they are on the same note. No, not kthreads. They should be synchronizing on the actual entities that they wanna keep empty. Where the kthreads are don't matter at all. It's the wrong thing to control in most cases. In the minority of cases where blocking kthread is enough, we can do that explicitly from the usual PM callbacks. > If I get this correctly. It works the following way now. PM notifies > both kthreads (via freezer) and device drivers (via callbacks) about > that the suspend is in progress. They are supposed to go into a sane > state. But there is no explicit communication between the kthread > and the driver. > > What do you exactly mean with the callbacks? Should they set some > flags that would navigate the kthread into some state? The driver PM callbacks. > > I'm strongly against this. We really don't want to make it even > > fuzzier. There are finite things which need to be controlled for PM > > operations and I believe what we need to do is identifying them and > > implementing explicit and clear control mechanisms for them, not > > spreading this feel-good mechanism even more, further obscuring where > > those points are. > > I see. This explains why you are so strongly against changing the > default. I see the following in the kernel sources: > > 61 set_freezable() > 97 kthread_create() > 259 kthread_run() > > I means that only about 17% kthreads are freezable these days. And some of them probably don't even need it. > > This becomes the game of "is it frozen ENOUGH yet?" and that "enough" > > is extremely difficult to determine as we're not looking at the choke > > spots at all and freezable kthreads only cover part of kernel > > activity. The fuzzy enoughness also actually inhibits development of > > proper mechanisms - "I believe this is frozen ENOUGH at this point so > > it is probably safe to assume that X, Y and Z aren't happening > > anymore" and it usually is except when it's not. > > I am not sure what you mean by frozen enough? I think that a tasks > is either frozen or not. Do I miss something, please? Please refer back to the above Q and N users example. Sure, each kthread is either frozen or thawed; however, whether you got all of N or not is pretty difficult to tell reliably and I'd say a lot of the currently existing freezer usage is pretty fuzzy on that respect. Thanks.
On Sat 2015-06-13 18:22:22, Tejun Heo wrote: > > I try to better understand why freezer is considered to be a blunt > > tool. Is it because it is a generic API, try_to_freeze() is put on > > "random" locations, so that it does not define the safe point > > precisely enough? > > Not that. I don't know how to explain it better. Hmmm... okay, let's > say there's a shared queue Q and N users o fit. If you wanna make Q > empty and keep it that way for a while, the right thing to do is > blocking new queueing and then wait till Q drains - you choke the > entity that you wanna control. > > Instead of that, freezer is trying to block the "N" users part. In > majority of cases, it blocks enough but it's pretty difficult to be > sure whether you actually got all N of them (as some of them may not > involve kthreads at all or unfreezable kthreads might end up doing > those operations too on corner cases) and it's also not that clear > whether blocking the N users actually make Q empty. Maybe there are > things which can be in flight asynchronously on Q even all its N users > are blocked. This is inherently finicky. I feel convinced that it does not make sense to make kthreads freezable by default and that we should not use it when not necessary. Thanks a lot for patience and so detailed explanation. Best Regards, Petr -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/kthread.c b/kernel/kthread.c index 185902d0e293..e34f67cb8ecf 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -412,13 +412,20 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data), * * This function defines the work flow of iterant kthreads. It calls * the given callbacks. Also it defines and implements a common - * check point for freezing and stopping. + * check point for freezing, stopping, and signal handling. + * + * IMPORTANT: Iterant kthreads are freezable by default. The functions + * defined in the given struct kthread_iterant still have to explicitly + * call try_to_freeze() after each interruptible sleep. They must avoid + * non-interruptible sleep if freezing is allowed. */ static int kthread_iterant_fn(void *kti_ptr) { struct kthread_iterant *kti = kti_ptr; void *data = kti->data; + set_freezable(); + if (kti->init) kti->init(data);
Many kthreads already calls set_freezable() before they enter the main cycle. One of the reasons for creating iterant kthreads is to create a safe point for freezing and make even more kthreads properly freezable. Therefore it would make sense to set all iterant kthreads freezable by default. However some kthreads might be hard to make properly freezable. For example, if they do non-interruptible sleeps. They would need to explicitly clear PF_NOFREEZE flag in the init() call. But it should be avoided whenever possible. Signed-off-by: Petr Mladek <pmladek@suse.cz> --- kernel/kthread.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)