Message ID | 20210115155506.2d59fe83.yaoaili@kingsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm,hwpoison: non-current task should be checked early_kill for force_early | expand |
On Fri, Jan 15, 2021 at 03:55:06PM +0800, Aili Yao wrote: > Hello,From 740148005f1333150ee1e119ed8a34454abb7c1d Mon Sep 17 00:00:00 2001 > From: Aili Yao <yaoaili@kingsoft.com> > Date: Fri, 15 Jan 2021 14:56:06 +0800 > Subject: [PATCH] mm,hwpoison: non-current task should be checked early_kill > for force_early > > Other process may also care the error info with the early-kill flag set. > when force_early is set, and if tsk is not current, leave it to the > early-kill flag check. I am having a hard time trying to grasp what are you trying to achieve here. Could you elaborate some more? Ideally stating what is the problem you are fixing here. I was also a bit confused about task_early_kill. So, if force_early is true (aka. MF_ACTION_REQUIRED was set), we only return a task struct if we find the task or a thread belonging to the task (sharing the mm). If it is not set by the caller of memory_failure, we still want to check the task/threads' MCE policy to check whether PF_MCE_KILL_EARLY it was set with prctl, right? That makes sense to me. So, back to your change, if force_early was set, but the main task does not match, you still want to check whether any thread belonging to the main task has the policy set?
On Fri, 15 Jan 2021 09:49:24 +0100 Oscar Salvador <osalvador@suse.de> wrote: > I am having a hard time trying to grasp what are you trying to achieve here. > Could you elaborate some more? Ideally stating what is the problem you are > fixing here. > Sorry for confusion, example: there are four process A,B,C,D,which map the same file into there process space, which set there PF_MCE_KILL_EARLY flag to TRUE, if process A trigger one UE with MF_ACTION_REQUIRED set, in current code, only process A will be killed, B,C,D remain alive, but for the PF_MCE_KILL_EARLY we set, we want B,C,D also be killed. this is what I want to fix. > If it is not set by the caller of memory_failure, we still want to check the > task/threads' MCE policy to check whether PF_MCE_KILL_EARLY it was set with > prctl, right? > Yes, right! > That makes sense to me. > > So, back to your change, if force_early was set, but the main task does not > match, you still want to check whether any thread belonging to the main task > has the policy set? > Yes.
On Fri, 15 Jan 2021 17:26:22 +0800 Aili Yao <yaoaili@kingsoft.com> wrote: > > So, back to your change, if force_early was set, but the main task does not > > match, you still want to check whether any thread belonging to the main task > > has the policy set? > > > Yes. > Sorry, the code only loop processes, not thread.
On Fri, Jan 15, 2021 at 05:31:25PM +0800, Aili Yao wrote: > On Fri, 15 Jan 2021 17:26:22 +0800 > Aili Yao <yaoaili@kingsoft.com> wrote: > > > > So, back to your change, if force_early was set, but the main task does not > > > match, you still want to check whether any thread belonging to the main task > > > has the policy set? > > > > > Yes. > > > Sorry, the code only loop processes, not thread. The loop in task_early_kill goes through processes, while the one in find_early_kill_thread goes through threads, right?
On Fri, 15 Jan 2021 10:40:37 +0100 Oscar Salvador <osalvador@suse.de> wrote: > On Fri, Jan 15, 2021 at 05:31:25PM +0800, Aili Yao wrote: > > On Fri, 15 Jan 2021 17:26:22 +0800 > > Aili Yao <yaoaili@kingsoft.com> wrote: > > > > > > So, back to your change, if force_early was set, but the main task does not > > > > match, you still want to check whether any thread belonging to the main task > > > > has the policy set? > > > > > > > Yes. > > > > > Sorry, the code only loop processes, not thread. > > The loop in task_early_kill goes through processes, while the one in > find_early_kill_thread goes through threads, right? > I think you are right, to early_kill, the code wants pick the right thread of one process.
On Fri, Jan 15, 2021 at 05:26:22PM +0800, Aili Yao wrote: > On Fri, 15 Jan 2021 09:49:24 +0100 > Oscar Salvador <osalvador@suse.de> wrote: > > > I am having a hard time trying to grasp what are you trying to achieve here. > > Could you elaborate some more? Ideally stating what is the problem you are > > fixing here. > > > Sorry for confusion, example: there are four process A,B,C,D,which map the same file into > there process space, which set there PF_MCE_KILL_EARLY flag to TRUE, if process A trigger one > UE with MF_ACTION_REQUIRED set, in current code, only process A will be killed, B,C,D remain > alive, but for the PF_MCE_KILL_EARLY we set, we want B,C,D also be killed. > this is what I want to fix. This should really be part of the changelog. (what is it fixing? how? why?) All this information should be there. And the title should be renamed as well as it is bit confusing as is. "Do not bail out in task_early_kill when force_early is set" or something along those lines.
Hi Aili, On Fri, Jan 15, 2021 at 05:26:22PM +0800, Aili Yao wrote: > On Fri, 15 Jan 2021 09:49:24 +0100 > Oscar Salvador <osalvador@suse.de> wrote: > > > I am having a hard time trying to grasp what are you trying to achieve here. > > Could you elaborate some more? Ideally stating what is the problem you are > > fixing here. > > > Sorry for confusion, example: there are four process A,B,C,D,which map the same file into > there process space, which set there PF_MCE_KILL_EARLY flag to TRUE, if process A trigger one > UE with MF_ACTION_REQUIRED set, in current code, only process A will be killed, B,C,D remain > alive, but for the PF_MCE_KILL_EARLY we set, we want B,C,D also be killed. This behavior seems not to me what PF_MCE_KILL_EARLY intends. This flag controls whether memory error handler kills processes immediately or not, and it only affects action optional cases (i.e. called without MF_ACTION_REQUIRED). In MF_ACTION_REQUIRED case, we have no such choice and affected processes should be always killed immediately. We may also need to consider the difference in context of these two cases. Action optional case is called asynchronously by background process like memory scrubbing, so all processes mapping the error memory are the affected ones. Action required event is more synchronous, and is called when a process experiences memory access errors on data load and instruction fetch instructions. So the affected process in this case is only the process. So I still think the this background justifies the current behavior. But my knowledge might be old, if you have newer hardwares which define other type of memory error and that doesn't fit with current implementation, I'd like to extend code to support the new cases, so please let me know. Thanks, Naoya Horiguchi
On Mon, 18 Jan 2021 05:15:55 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > Hi Aili, > > On Fri, Jan 15, 2021 at 05:26:22PM +0800, Aili Yao wrote: > > On Fri, 15 Jan 2021 09:49:24 +0100 > > Oscar Salvador <osalvador@suse.de> wrote: > > > > > I am having a hard time trying to grasp what are you trying to achieve here. > > > Could you elaborate some more? Ideally stating what is the problem you are > > > fixing here. > > > > > Sorry for confusion, example: there are four process A,B,C,D,which map the same file into > > there process space, which set there PF_MCE_KILL_EARLY flag to TRUE, if process A trigger one > > UE with MF_ACTION_REQUIRED set, in current code, only process A will be killed, B,C,D remain > > alive, but for the PF_MCE_KILL_EARLY we set, we want B,C,D also be killed. > > This behavior seems not to me what PF_MCE_KILL_EARLY intends. This flag > controls whether memory error handler kills processes immediately or not, > and it only affects action optional cases (i.e. called without > MF_ACTION_REQUIRED). In MF_ACTION_REQUIRED case, we have no such choice > and affected processes should be always killed immediately. > > We may also need to consider the difference in context of these two cases. > Action optional case is called asynchronously by background process like > memory scrubbing, so all processes mapping the error memory are the affected > ones. Action required event is more synchronous, and is called when a > process experiences memory access errors on data load and instruction fetch > instructions. So the affected process in this case is only the process. > So I still think the this background justifies the current behavior. > > But my knowledge might be old, if you have newer hardwares which define > other type of memory error and that doesn't fit with current implementation, > I'd like to extend code to support the new cases, so please let me know. > Sorry, I don't fully get your concern. For Action optional cases, It's may from CE storm or patrol scrub, when the process want to process this condition, it will set PF_MCE_KILL_EARLY, and it will be signaled for such case. For Action Required cases,we must do something, I think it's more urgent and serious, In the current code, the process triggered the Error Should be signaled. but the process with PF_MCE_KILL_EARLY won't get signaled, just because PF_MCE_KILL_EARLY is for action optional case? Action Required is for current we must handle, the same Action Required issue is Action optional for non-current processes, Right? I don't think Action Required is for all processes, For current processes , it may be AR, for other process, it may be AO, and they should also be signaled, I think this behavior its reasonable. And we can't determine which error will be triggered, the PF_MCE_KILL_EARLY fLAG is meant to handle memory error gracefully and won't be restricted to explicitly declared AO errors. Thanks!
On Mon, Jan 18, 2021 at 01:57:44PM +0800, Aili Yao wrote: > On Mon, 18 Jan 2021 05:15:55 +0000 > HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > > Hi Aili, > > > > On Fri, Jan 15, 2021 at 05:26:22PM +0800, Aili Yao wrote: > > > On Fri, 15 Jan 2021 09:49:24 +0100 > > > Oscar Salvador <osalvador@suse.de> wrote: > > > > > > > I am having a hard time trying to grasp what are you trying to achieve here. > > > > Could you elaborate some more? Ideally stating what is the problem you are > > > > fixing here. > > > > > > > Sorry for confusion, example: there are four process A,B,C,D,which map the same file into > > > there process space, which set there PF_MCE_KILL_EARLY flag to TRUE, if process A trigger one > > > UE with MF_ACTION_REQUIRED set, in current code, only process A will be killed, B,C,D remain > > > alive, but for the PF_MCE_KILL_EARLY we set, we want B,C,D also be killed. > > > > This behavior seems not to me what PF_MCE_KILL_EARLY intends. This flag > > controls whether memory error handler kills processes immediately or not, > > and it only affects action optional cases (i.e. called without > > MF_ACTION_REQUIRED). In MF_ACTION_REQUIRED case, we have no such choice > > and affected processes should be always killed immediately. > > > > We may also need to consider the difference in context of these two cases. > > Action optional case is called asynchronously by background process like > > memory scrubbing, so all processes mapping the error memory are the affected > > ones. Action required event is more synchronous, and is called when a > > process experiences memory access errors on data load and instruction fetch > > instructions. So the affected process in this case is only the process. > > So I still think the this background justifies the current behavior. > > > > But my knowledge might be old, if you have newer hardwares which define > > other type of memory error and that doesn't fit with current implementation, > > I'd like to extend code to support the new cases, so please let me know. > > > Sorry, I don't fully get your concern. > > For Action optional cases, It's may from CE storm or patrol scrub, ... hwpoison is not about corrected errors, but about uncorrected errors. CE storm should be handled by CMCI and userspace tool like mcelog, although it seems not current main topic, sorry for nitpick. > when the process want to process this condition, > it will set PF_MCE_KILL_EARLY, and it will be signaled for such case. > For Action Required cases,we must do something, I think it's more urgent and serious, In the current code, the process triggered the Error > Should be signaled. but the process with PF_MCE_KILL_EARLY won't get signaled, just because PF_MCE_KILL_EARLY is for action optional case? I don't use PF_MCE_KILL_EARLY to justify current code. Let me explain more. For action optional cases, one error event kills *only one* process. If an error page are shared by multiple processes, these processes will be killed by separate error events, each of which is triggered when each process tries to access the error memory. So these processes would be killed immediately when accessing the error, but you don't have to kill all at the same time (or actually you might not even have to kill it at all if the process exits finally without accessing the error later). Maybe the function variable "force_early" is named confusingly (it sounds that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). I'll submit a fix later. (I'll add your "Reported-by" because you made me find it, thank you.) > > Action Required is for current we must handle, the same Action Required issue is Action optional for non-current processes, Right? Right. > I don't think Action Required is for all processes, For current processes , it may be AR, for other process, it may be AO, and they should also > be signaled, I think this behavior its reasonable. > > And we can't determine which error will be triggered, the PF_MCE_KILL_EARLY fLAG is meant to handle memory error gracefully and won't be restricted > to explicitly declared AO errors. > > Thanks! Thank you, too. - Naoya
On Mon, 18 Jan 2021 06:50:54 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > On Mon, Jan 18, 2021 at 01:57:44PM +0800, Aili Yao wrote: > > On Mon, 18 Jan 2021 05:15:55 +0000 > > HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > > > > Hi Aili, > > > > > > On Fri, Jan 15, 2021 at 05:26:22PM +0800, Aili Yao wrote: > > > > On Fri, 15 Jan 2021 09:49:24 +0100 > > > > Oscar Salvador <osalvador@suse.de> wrote: > > > > > > > > > I am having a hard time trying to grasp what are you trying to achieve here. > > > > > Could you elaborate some more? Ideally stating what is the problem you are > > > > > fixing here. > > > > > > > > > Sorry for confusion, example: there are four process A,B,C,D,which map the same file into > > > > there process space, which set there PF_MCE_KILL_EARLY flag to TRUE, if process A trigger one > > > > UE with MF_ACTION_REQUIRED set, in current code, only process A will be killed, B,C,D remain > > > > alive, but for the PF_MCE_KILL_EARLY we set, we want B,C,D also be killed. > > > > > > This behavior seems not to me what PF_MCE_KILL_EARLY intends. This flag > > > controls whether memory error handler kills processes immediately or not, > > > and it only affects action optional cases (i.e. called without > > > MF_ACTION_REQUIRED). In MF_ACTION_REQUIRED case, we have no such choice > > > and affected processes should be always killed immediately. > > > > > > We may also need to consider the difference in context of these two cases. > > > Action optional case is called asynchronously by background process like > > > memory scrubbing, so all processes mapping the error memory are the affected > > > ones. Action required event is more synchronous, and is called when a > > > process experiences memory access errors on data load and instruction fetch > > > instructions. So the affected process in this case is only the process. > > > So I still think the this background justifies the current behavior. > > > > > > But my knowledge might be old, if you have newer hardwares which define > > > other type of memory error and that doesn't fit with current implementation, > > > I'd like to extend code to support the new cases, so please let me know. > > > > > Sorry, I don't fully get your concern. > > > > For Action optional cases, It's may from CE storm or patrol scrub, ... > > hwpoison is not about corrected errors, but about uncorrected errors. CE storm > should be handled by CMCI and userspace tool like mcelog, although it seems not > current main topic, sorry for nitpick. > When hard page offline is configured, CE will also call memory-failure > > when the process want to process this condition, > > it will set PF_MCE_KILL_EARLY, and it will be signaled for such case. > > For Action Required cases,we must do something, I think it's more urgent and serious, In the current code, the process triggered the Error > > Should be signaled. but the process with PF_MCE_KILL_EARLY won't get signaled, just because PF_MCE_KILL_EARLY is for action optional case? > > I don't use PF_MCE_KILL_EARLY to justify current code. Let me explain more. > > For action optional cases, one error event kills *only one* process. If an > error page are shared by multiple processes, these processes will be killed > by separate error events, each of which is triggered when each process tries > to access the error memory. So these processes would be killed immediately > when accessing the error, but you don't have to kill all at the same time > (or actually you might not even have to kill it at all if the process exits > finally without accessing the error later). > It's not the way PF_MCE_KILL_EARLY want, normally one action optional without PF_MCE_KILL_EARLY will be signaled when it really access it, when PF_MCE_KILL_EARLY set, we may not just want be killed, wo may capture the signal and do some thing more. > Maybe the function variable "force_early" is named confusingly (it sounds > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > I'll submit a fix later. (I'll add your "Reported-by" because you made me > find it, thank you.) > not related to force_early, this is about the memory action we take for error , but if you have a better one, that's will be good. > > > > Action Required is for current we must handle, the same Action Required issue is Action optional for non-current processes, Right? > > Right. > > > I don't think Action Required is for all processes, For current processes , it may be AR, for other process, it may be AO, and they should also > > be signaled, I think this behavior its reasonable. > > > > And we can't determine which error will be triggered, the PF_MCE_KILL_EARLY fLAG is meant to handle memory error gracefully and won't be restricted > > to explicitly declared AO errors. > > Thanks
On Mon, 18 Jan 2021 06:50:54 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > For action optional cases, one error event kills *only one* process. If an > error page are shared by multiple processes, these processes will be killed > by separate error events, each of which is triggered when each process tries > to access the error memory. So these processes would be killed immediately > when accessing the error, but you don't have to kill all at the same time > (or actually you might not even have to kill it at all if the process exits > finally without accessing the error later). > > Maybe the function variable "force_early" is named confusingly (it sounds > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > I'll submit a fix later. (I'll add your "Reported-by" because you made me > find it, thank you.) > I think we should do more for non current process error case, we should mark it AO for processes to be signaled or we may take wrong action.
On Mon, Jan 18, 2021 at 04:15:12PM +0800, Aili Yao wrote: > On Mon, 18 Jan 2021 06:50:54 +0000 > HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > > > > For action optional cases, one error event kills *only one* process. If an > > error page are shared by multiple processes, these processes will be killed > > by separate error events, each of which is triggered when each process tries > > to access the error memory. So these processes would be killed immediately > > when accessing the error, but you don't have to kill all at the same time > > (or actually you might not even have to kill it at all if the process exits > > finally without accessing the error later). > > > > Maybe the function variable "force_early" is named confusingly (it sounds > > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > > I'll submit a fix later. (I'll add your "Reported-by" because you made me > > find it, thank you.) > > > I think we should do more for non current process error case, we should mark it AO for processes to be signaled > or we may take wrong action. I'm not sure what you mean by "non current process error case" and "we should mark it AO", so could you explain more specifically about your error scenario? Especially I'd like to know about who triggers hard offline on what hardware events and what "wrong action" could happen. Maybe just "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because it's not enough for us to see that your scenario is possible. Current implementation implicitly assumes some hardware behavior, and does not work for the case which never happens under the assumption. Do you have some test cases to reproduce any specific issue (like data lost) on your system? (If yes, please share it.) Or your concern is from code review? Thanks, Naoya Horiguchi
On Mon, 18 Jan 2021 08:57:47 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > > > > > For action optional cases, one error event kills *only one* process. If an > > > error page are shared by multiple processes, these processes will be killed > > > by separate error events, each of which is triggered when each process tries > > > to access the error memory. So these processes would be killed immediately > > > when accessing the error, but you don't have to kill all at the same time > > > (or actually you might not even have to kill it at all if the process exits > > > finally without accessing the error later). > > > > > > Maybe the function variable "force_early" is named confusingly (it sounds > > > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > > > I'll submit a fix later. (I'll add your "Reported-by" because you made me > > > find it, thank you.) > > > > > I think we should do more for non current process error case, we should mark it AO for processes to be signaled > > or we may take wrong action. > > I'm not sure what you mean by "non current process error case" and "we > should mark it AO", so could you explain more specifically about your error > scenario? I will share my test code and i will submit another patch to this scenario. please give me some time, thanks! And I think you are right, AR is only current process. > Especially I'd like to know about who triggers hard offline on > what hardware events and what "wrong action" could happen. Maybe just > "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because > it's not enough for us to see that your scenario is possible. Current > implementation implicitly assumes some hardware behavior, and does not work > for the case which never happens under the assumption. > This action is from mcelog daemon, normally softpage offlie is default, but we can configure hardpage offline for CE storms, to get related processes signaled. > Do you have some test cases to reproduce any specific issue (like data lost) > on your system? (If yes, please share it.) Or your concern is from code review? > I will make it clean, get it shared Thanks
On Mon, Jan 18, 2021 at 08:57:47AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote: > I'm not sure what you mean by "non current process error case" and "we > should mark it AO", so could you explain more specifically about your error > scenario? Especially I'd like to know about who triggers hard offline on > what hardware events and what "wrong action" could happen. Maybe just > "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because > it's not enough for us to see that your scenario is possible. Current > implementation implicitly assumes some hardware behavior, and does not work > for the case which never happens under the assumption. So, the scenario case is a multithread application with the same page mapped. And PF_MCE_KILL_EARLY flag was set. IIUIC, Aili Yao concern is that when the MCE machinery calls memory_failure which MF_ACTION_REQUIRED, only the process that triggered the MCE exception will receive a SIGBUG, and not the other threads that might have PF_MCE_EARLY. Aili Yao would like memory_failure() to also signal those threads who might have the flag set, in case they want to do something with that information. But reading the code, I do not think that is what the code expects. Looking at the comment above find_early_kill_thread: "/* * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_AO) * on behalf of the thread group. Return task_struct of the (first found) * dedicated thread if found, and return NULL otherwise. * * We already hold read_lock(&tasklist_lock) in the caller, so we don't * have to call rcu_read_lock/unlock() in this function. */" What I understand from that is: " If memory_failure() was not triggered by any concrete process (aka: no one was trying to manipulate the corrupted area), we need to find the main thread who might have set the MCE policy by pcrtl and see if they want to be signaled __before__ they access the corrupted area. " Note that if the PF_MCE policy was not set, we check the global knob sysctm_memory_early_kill. And if that is not set either, we defer the signaling till later when a process actually tries to operate the corrupted area. Does that makes sense? Actually, unless I am mistaken, if a multithread process receives a signal, all threads belonging to the process will receive the signal as well: "The signal disposition is a per-process attribute: in a multithreaded application, the disposition of a particular signal is the same for all threads."
On Mon, 18 Jan 2021 10:24:23 +0100 Oscar Salvador <osalvador@suse.de> wrote: > > So, the scenario case is a multithread application with the same page mapped. > And PF_MCE_KILL_EARLY flag was set. > The scenario is not related multithread application; it's about multiprocess application which share the same page > IIUIC, Aili Yao concern is that when the MCE machinery calls memory_failure > which MF_ACTION_REQUIRED, only the process that triggered the MCE exception > will receive a SIGBUG, and not the other threads that might have PF_MCE_EARLY. > Aili Yao would like memory_failure() to also signal those threads who might > have the flag set, in case they want to do something with that information. > For the processes who care the memory with early flag set, they may specify one thread to process related signal such as signal bus, when the flag set, the thread want to handle the error gracefully and not kill the process all, and may do something more. > But reading the code, I do not think that is what the code expects. > Looking at the comment above find_early_kill_thread: > > "/* > * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_AO) > * on behalf of the thread group. Return task_struct of the (first found) > * dedicated thread if found, and return NULL otherwise. > * > * We already hold read_lock(&tasklist_lock) in the caller, so we don't > * have to call rcu_read_lock/unlock() in this function. > */" > > What I understand from that is: > > " > If memory_failure() was not triggered by any concrete process (aka: no one was > trying to manipulate the corrupted area), we need to find the main thread who > might have set the MCE policy by pcrtl and see if they want to be signaled > __before__ they access the corrupted area. yes, it doesn't just want to be killed all. Thanks
On Mon, Jan 18, 2021 at 05:38:52PM +0800, Aili Yao wrote: > On Mon, 18 Jan 2021 10:24:23 +0100 > Oscar Salvador <osalvador@suse.de> wrote: > > > > > So, the scenario case is a multithread application with the same page mapped. > > And PF_MCE_KILL_EARLY flag was set. > > > The scenario is not related multithread application; > it's about multiprocess application which share the same page Ok, I misunderstood that part. Then, I do not fully understand the concern. I guess you are kind of memory scrubbing with mcelog defaulting to memory_failure instead of soft_offline. E.g: collect_procs_anon will pick the processes which have the MCE policy set and have the page mapped. That is all we should need. So, if you have processes A, B, C and D with X page mapped, and you set the MCE policy on those four, they should get picked up by your scrubbing when handling X page by memory_failure. Is not that right?
On Mon, 18 Jan 2021 08:57:47 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > On Mon, Jan 18, 2021 at 04:15:12PM +0800, Aili Yao wrote: > > On Mon, 18 Jan 2021 06:50:54 +0000 > > HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > > > > > > > For action optional cases, one error event kills *only one* process. If an > > > error page are shared by multiple processes, these processes will be killed > > > by separate error events, each of which is triggered when each process tries > > > to access the error memory. So these processes would be killed immediately > > > when accessing the error, but you don't have to kill all at the same time > > > (or actually you might not even have to kill it at all if the process exits > > > finally without accessing the error later). > > > > > > Maybe the function variable "force_early" is named confusingly (it sounds > > > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > > > I'll submit a fix later. (I'll add your "Reported-by" because you made me > > > find it, thank you.) > > > > > I think we should do more for non current process error case, we should mark it AO for processes to be signaled > > or we may take wrong action. > > I'm not sure what you mean by "non current process error case" and "we > should mark it AO", so could you explain more specifically about your error > scenario? Especially I'd like to know about who triggers hard offline on > what hardware events and what "wrong action" could happen. Maybe just > "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because > it's not enough for us to see that your scenario is possible. Current > implementation implicitly assumes some hardware behavior, and does not work > for the case which never happens under the assumption. > > Do you have some test cases to reproduce any specific issue (like data lost) > on your system? (If yes, please share it.) Or your concern is from code review? Hope this draft test code will be helpful, Thanks
On Mon, Jan 18, 2021 at 05:09:00PM +0800, Aili Yao wrote: > On Mon, 18 Jan 2021 08:57:47 +0000 > HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > > > > > > > > For action optional cases, one error event kills *only one* process. If an > > > > error page are shared by multiple processes, these processes will be killed > > > > by separate error events, each of which is triggered when each process tries > > > > to access the error memory. So these processes would be killed immediately > > > > when accessing the error, but you don't have to kill all at the same time > > > > (or actually you might not even have to kill it at all if the process exits > > > > finally without accessing the error later). > > > > > > > > Maybe the function variable "force_early" is named confusingly (it sounds > > > > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > > > > I'll submit a fix later. (I'll add your "Reported-by" because you made me > > > > find it, thank you.) > > > > > > > I think we should do more for non current process error case, we should mark it AO for processes to be signaled > > > or we may take wrong action. > > > > I'm not sure what you mean by "non current process error case" and "we > > should mark it AO", so could you explain more specifically about your error > > scenario? > I will share my test code and i will submit another patch to this scenario. > please give me some time, thanks! > And I think you are right, AR is only current process. > > > Especially I'd like to know about who triggers hard offline on > > what hardware events and what "wrong action" could happen. Maybe just > > "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because > > it's not enough for us to see that your scenario is possible. Current > > implementation implicitly assumes some hardware behavior, and does not work > > for the case which never happens under the assumption. > > > This action is from mcelog daemon, normally softpage offlie is default, but we can configure > hardpage offline for CE storms, to get related processes signaled. Thanks, so which interface did you use for error injection? I guess first you used /sys/devices/system/memory/hard_offline_page, but if it's true, then the error event should be action optional (no MF_ACTION_REQUIRED set). So now I'm wondering why you are observing action required events? My another guess is that you might have used mce-inject tool, if that's true, please use hard_offline_page, then current kernel code should properly send SIGBUS to dedicated process. - Naoya
> Thanks, so which interface did you use for error injection? I guess first > you used /sys/devices/system/memory/hard_offline_page, but if it's true, > then the error event should be action optional (no MF_ACTION_REQUIRED set). > So now I'm wondering why you are observing action required events? > My another guess is that you might have used mce-inject tool, if that's true, > please use hard_offline_page, then current kernel code should properly send > SIGBUS to dedicated process. > The test has no relation to ce and hard_offline_page, sorry for misleading. if the test code will compiled to my_test bin, here is my script: ./my_test hola salut servus test haaa --- this case no early-kill flag set or ./my_test -s hola salut servus test haaa --- this case early-kill is set. there must be a process names "test" which will trigger the UE; when the my_test start, the shared page's physical address will be printed; In another console, I will use einj module to inject the 0X10 LEVEL error to this physical address. After that, the test process trigger the memory-failure process, and then the log should be observed. Thanks.
On Tue, Jan 19, 2021 at 02:04:18PM +0800, Aili Yao wrote: > > > Thanks, so which interface did you use for error injection? I guess first > > you used /sys/devices/system/memory/hard_offline_page, but if it's true, > > then the error event should be action optional (no MF_ACTION_REQUIRED set). > > So now I'm wondering why you are observing action required events? > > My another guess is that you might have used mce-inject tool, if that's true, > > please use hard_offline_page, then current kernel code should properly send > > SIGBUS to dedicated process. > > > The test has no relation to ce and hard_offline_page, sorry for misleading. > > if the test code will compiled to my_test bin, here is my script: > ./my_test hola salut servus test haaa --- this case no early-kill flag set > or > ./my_test -s hola salut servus test haaa --- this case early-kill is set. > > there must be a process names "test" which will trigger the UE; > > when the my_test start, the shared page's physical address will be printed; > In another console, I will use einj module to inject the 0X10 LEVEL error to this > physical address. Ah, OK, so the problem is becoming clearer, thanks. I'm feeling positive to change code to fall back to find_early_kill_thread(). So I'll take a look on v2. Thanks, Naoya Horiguchi
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..5e9e591c0929 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -457,8 +457,6 @@ static struct task_struct *task_early_kill(struct task_struct *tsk, */ if (tsk->mm == current->mm) return current; - else - return NULL; } return find_early_kill_thread(tsk); }