Message ID | 20211018114712.9802-3-mhocko@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | extend vmalloc support for constrained allocations | expand |
Fat fingers on my side. A follow up to fold commit 63d1b80b9a298c9380e5175e2add7025b6bd2600 Author: Michal Hocko <mhocko@suse.com> Date: Mon Oct 18 18:47:04 2021 +0200 fold me "mm/vmalloc: add support for __GFP_NOFAIL" diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 3a5a178295d1..4ce9ccc33e33 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3034,7 +3034,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, vm_struct allocation failed", real_size); - if (gfp_mask && __GFP_NOFAIL) + if (gfp_mask & __GFP_NOFAIL) goto again; goto fail; }
> From: Michal Hocko <mhocko@suse.com> > > Dave Chinner has mentioned that some of the xfs code would benefit from > kvmalloc support for __GFP_NOFAIL because they have allocations that > cannot fail and they do not fit into a single page. > > The larg part of the vmalloc implementation already complies with the > given gfp flags so there is no work for those to be done. The area > and page table allocations are an exception to that. Implement a retry > loop for those. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/vmalloc.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 7455c89598d3..3a5a178295d1 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > else if (!(gfp_mask & (__GFP_FS | __GFP_IO))) > flags = memalloc_noio_save(); > > - ret = vmap_pages_range(addr, addr + size, prot, area->pages, > + do { > + ret = vmap_pages_range(addr, addr + size, prot, area->pages, > page_shift); > + } while ((gfp_mask & __GFP_NOFAIL) && (ret < 0)); > > if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) > memalloc_nofs_restore(flags); > @@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > warn_alloc(gfp_mask, NULL, > "vmalloc error: size %lu, vm_struct allocation failed", > real_size); > + if (gfp_mask && __GFP_NOFAIL) > + goto again; > goto fail; > } > > -- > 2.30.2 > I have checked the vmap code how it aligns with the __GFP_NOFAIL flag. To me it looks correct from functional point of view. There is one place though it is kasan_populate_vmalloc_pte(). It does not use gfp_mask, instead it directly deals with GFP_KERNEL for its internal purpose. If it fails the code will end up in loping in the __vmalloc_node_range(). I am not sure how it is important to pass __GFP_NOFAIL into KASAN code. Any thoughts about it? -- Vlad Rezki
On Tue 19-10-21 13:06:49, Uladzislau Rezki wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Dave Chinner has mentioned that some of the xfs code would benefit from > > kvmalloc support for __GFP_NOFAIL because they have allocations that > > cannot fail and they do not fit into a single page. > > > > The larg part of the vmalloc implementation already complies with the > > given gfp flags so there is no work for those to be done. The area > > and page table allocations are an exception to that. Implement a retry > > loop for those. > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > mm/vmalloc.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 7455c89598d3..3a5a178295d1 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > else if (!(gfp_mask & (__GFP_FS | __GFP_IO))) > > flags = memalloc_noio_save(); > > > > - ret = vmap_pages_range(addr, addr + size, prot, area->pages, > > + do { > > + ret = vmap_pages_range(addr, addr + size, prot, area->pages, > > page_shift); > > + } while ((gfp_mask & __GFP_NOFAIL) && (ret < 0)); > > > > if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) > > memalloc_nofs_restore(flags); > > @@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > > warn_alloc(gfp_mask, NULL, > > "vmalloc error: size %lu, vm_struct allocation failed", > > real_size); > > + if (gfp_mask && __GFP_NOFAIL) > > + goto again; > > goto fail; > > } > > > > -- > > 2.30.2 > > > I have checked the vmap code how it aligns with the __GFP_NOFAIL flag. > To me it looks correct from functional point of view. > > There is one place though it is kasan_populate_vmalloc_pte(). It does > not use gfp_mask, instead it directly deals with GFP_KERNEL for its > internal purpose. If it fails the code will end up in loping in the > __vmalloc_node_range(). > > I am not sure how it is important to pass __GFP_NOFAIL into KASAN code. > > Any thoughts about it? The flag itself is not really necessary down there as long as we guarantee that the high level logic doesn't fail. In this case we keep retrying at __vmalloc_node_range level which should be possible to cover all callers that can control gfp mask. I was thinking to put it into __get_vm_area_node but that was slightly more hairy and we would be losing the warning which might turn out being helpful in cases where the failure is due to lack of vmalloc space or similar constrain. Btw. do we want some throttling on a retry? It would be better if the kasan part dealt with gfp mask properly though and something that we can do on top.
On Tue, Oct 19, 2021 at 01:52:07PM +0200, Michal Hocko wrote: > On Tue 19-10-21 13:06:49, Uladzislau Rezki wrote: > > > From: Michal Hocko <mhocko@suse.com> > > > > > > Dave Chinner has mentioned that some of the xfs code would benefit from > > > kvmalloc support for __GFP_NOFAIL because they have allocations that > > > cannot fail and they do not fit into a single page. > > > > > > The larg part of the vmalloc implementation already complies with the > > > given gfp flags so there is no work for those to be done. The area > > > and page table allocations are an exception to that. Implement a retry > > > loop for those. > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > --- > > > mm/vmalloc.c | 6 +++++- > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index 7455c89598d3..3a5a178295d1 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > > else if (!(gfp_mask & (__GFP_FS | __GFP_IO))) > > > flags = memalloc_noio_save(); > > > > > > - ret = vmap_pages_range(addr, addr + size, prot, area->pages, > > > + do { > > > + ret = vmap_pages_range(addr, addr + size, prot, area->pages, > > > page_shift); > > > + } while ((gfp_mask & __GFP_NOFAIL) && (ret < 0)); > > > > > > if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) > > > memalloc_nofs_restore(flags); > > > @@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > > > warn_alloc(gfp_mask, NULL, > > > "vmalloc error: size %lu, vm_struct allocation failed", > > > real_size); > > > + if (gfp_mask && __GFP_NOFAIL) > > > + goto again; > > > goto fail; > > > } > > > > > > -- > > > 2.30.2 > > > > > I have checked the vmap code how it aligns with the __GFP_NOFAIL flag. > > To me it looks correct from functional point of view. > > > > There is one place though it is kasan_populate_vmalloc_pte(). It does > > not use gfp_mask, instead it directly deals with GFP_KERNEL for its > > internal purpose. If it fails the code will end up in loping in the > > __vmalloc_node_range(). > > > > I am not sure how it is important to pass __GFP_NOFAIL into KASAN code. > > > > Any thoughts about it? > > The flag itself is not really necessary down there as long as we > guarantee that the high level logic doesn't fail. In this case we keep > retrying at __vmalloc_node_range level which should be possible to cover > all callers that can control gfp mask. I was thinking to put it into > __get_vm_area_node but that was slightly more hairy and we would be > losing the warning which might turn out being helpful in cases where the > failure is due to lack of vmalloc space or similar constrain. Btw. do we > want some throttling on a retry? > I think adding kind of schedule() will not make things worse and in corner cases could prevent a power drain by CPU. It is important for mobile devices. As for vmap space, it can be that a user specifies a short range that does not contain any free area. In that case we might never return back to a caller. Maybe add a good comment something like: think what you do when deal with the __vmalloc_node_range() and __GFP_NOFAIL? -- Vlad Rezki
On Tue 19-10-21 21:46:58, Uladzislau Rezki wrote: > On Tue, Oct 19, 2021 at 01:52:07PM +0200, Michal Hocko wrote: > > On Tue 19-10-21 13:06:49, Uladzislau Rezki wrote: > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > > > Dave Chinner has mentioned that some of the xfs code would benefit from > > > > kvmalloc support for __GFP_NOFAIL because they have allocations that > > > > cannot fail and they do not fit into a single page. > > > > > > > > The larg part of the vmalloc implementation already complies with the > > > > given gfp flags so there is no work for those to be done. The area > > > > and page table allocations are an exception to that. Implement a retry > > > > loop for those. > > > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > --- > > > > mm/vmalloc.c | 6 +++++- > > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > > index 7455c89598d3..3a5a178295d1 100644 > > > > --- a/mm/vmalloc.c > > > > +++ b/mm/vmalloc.c > > > > @@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > > > else if (!(gfp_mask & (__GFP_FS | __GFP_IO))) > > > > flags = memalloc_noio_save(); > > > > > > > > - ret = vmap_pages_range(addr, addr + size, prot, area->pages, > > > > + do { > > > > + ret = vmap_pages_range(addr, addr + size, prot, area->pages, > > > > page_shift); > > > > + } while ((gfp_mask & __GFP_NOFAIL) && (ret < 0)); > > > > > > > > if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) > > > > memalloc_nofs_restore(flags); > > > > @@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > > > > warn_alloc(gfp_mask, NULL, > > > > "vmalloc error: size %lu, vm_struct allocation failed", > > > > real_size); > > > > + if (gfp_mask && __GFP_NOFAIL) > > > > + goto again; > > > > goto fail; > > > > } > > > > > > > > -- > > > > 2.30.2 > > > > > > > I have checked the vmap code how it aligns with the __GFP_NOFAIL flag. > > > To me it looks correct from functional point of view. > > > > > > There is one place though it is kasan_populate_vmalloc_pte(). It does > > > not use gfp_mask, instead it directly deals with GFP_KERNEL for its > > > internal purpose. If it fails the code will end up in loping in the > > > __vmalloc_node_range(). > > > > > > I am not sure how it is important to pass __GFP_NOFAIL into KASAN code. > > > > > > Any thoughts about it? > > > > The flag itself is not really necessary down there as long as we > > guarantee that the high level logic doesn't fail. In this case we keep > > retrying at __vmalloc_node_range level which should be possible to cover > > all callers that can control gfp mask. I was thinking to put it into > > __get_vm_area_node but that was slightly more hairy and we would be > > losing the warning which might turn out being helpful in cases where the > > failure is due to lack of vmalloc space or similar constrain. Btw. do we > > want some throttling on a retry? > > > I think adding kind of schedule() will not make things worse and in corner > cases could prevent a power drain by CPU. It is important for mobile devices. I suspect you mean schedule_timeout here? Or cond_resched? I went with a later for now, I do not have a good idea for how to long to sleep here. I am more than happy to change to to a sleep though. > As for vmap space, it can be that a user specifies a short range that does > not contain any free area. In that case we might never return back to a caller. This is to be expected. The caller cannot fail and if it would be looping around vmalloc it wouldn't return anyway. > Maybe add a good comment something like: think what you do when deal with the > __vmalloc_node_range() and __GFP_NOFAIL? We have a generic documentation for gfp flags and __GFP_NOFAIL is docuemented to "The allocation could block indefinitely but will never return with failure." We are discussing improvements for the generic documentation in another thread [1] and we will likely extend it so I suspect we do not have to repeat drawbacks here again. [1] http://lkml.kernel.org/r/163184741778.29351.16920832234899124642.stgit@noble.brown Anyway the gfp mask description and constrains for vmalloc are not documented. I will add a new patch to fill that gap and send it as a reply to this one
On Wed 20-10-21 10:25:06, Michal Hocko wrote: [...] > > > The flag itself is not really necessary down there as long as we > > > guarantee that the high level logic doesn't fail. In this case we keep > > > retrying at __vmalloc_node_range level which should be possible to cover > > > all callers that can control gfp mask. I was thinking to put it into > > > __get_vm_area_node but that was slightly more hairy and we would be > > > losing the warning which might turn out being helpful in cases where the > > > failure is due to lack of vmalloc space or similar constrain. Btw. do we > > > want some throttling on a retry? > > > > > I think adding kind of schedule() will not make things worse and in corner > > cases could prevent a power drain by CPU. It is important for mobile devices. > > I suspect you mean schedule_timeout here? Or cond_resched? I went with a > later for now, I do not have a good idea for how to long to sleep here. > I am more than happy to change to to a sleep though. Forgot to paste the follow up I have staged currently --- commit 66fea55e5543fa234692a70144cd63c7a1bca32f Author: Michal Hocko <mhocko@suse.com> Date: Wed Oct 20 10:12:45 2021 +0200 fold me "mm/vmalloc: add support for __GFP_NOFAIL" - add cond_resched diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 0fb5413d9239..f7098e616883 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2944,6 +2944,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, do { ret = vmap_pages_range(addr, addr + size, prot, area->pages, page_shift); + cond_resched(); } while ((gfp_mask & __GFP_NOFAIL) && (ret < 0)); if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) @@ -3034,8 +3035,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, vm_struct allocation failed", real_size); - if (gfp_mask & __GFP_NOFAIL) + if (gfp_mask & __GFP_NOFAIL) { + cond_resched(); goto again; + } goto fail; }
> > > > > I think adding kind of schedule() will not make things worse and in corner > > cases could prevent a power drain by CPU. It is important for mobile devices. > > I suspect you mean schedule_timeout here? Or cond_resched? I went with a > later for now, I do not have a good idea for how to long to sleep here. > I am more than happy to change to to a sleep though. > cond_resched() reschedules only if TIF_NEED_RESCHED is raised what is not good here. Because in our case we know that we definitely would like to take a breath. Therefore invoking the schedule() is more suitable here. It will give a CPU time to another waiting process(if exists) in any case putting the "current" one to the tail. As for adding a delay. I am not sure about for how long to delay or i would say i do not see a good explanation why for example we delay for 10 milliseconds or so. > > As for vmap space, it can be that a user specifies a short range that does > > not contain any free area. In that case we might never return back to a caller. > > This is to be expected. The caller cannot fail and if it would be > looping around vmalloc it wouldn't return anyway. > > > Maybe add a good comment something like: think what you do when deal with the > > __vmalloc_node_range() and __GFP_NOFAIL? > > We have a generic documentation for gfp flags and __GFP_NOFAIL is > docuemented to "The allocation could block indefinitely but will never > return with failure." We are discussing improvements for the generic > documentation in another thread [1] and we will likely extend it so I > suspect we do not have to repeat drawbacks here again. > > [1] http://lkml.kernel.org/r/163184741778.29351.16920832234899124642.stgit@noble.brown > > Anyway the gfp mask description and constrains for vmalloc are not > documented. I will add a new patch to fill that gap and send it as a > reply to this one > This is really good. People should be prepared for a case when it never returns back to a caller :)
On Wed 20-10-21 15:54:23, Uladzislau Rezki wrote: > > > > > > > I think adding kind of schedule() will not make things worse and in corner > > > cases could prevent a power drain by CPU. It is important for mobile devices. > > > > I suspect you mean schedule_timeout here? Or cond_resched? I went with a > > later for now, I do not have a good idea for how to long to sleep here. > > I am more than happy to change to to a sleep though. > > > cond_resched() reschedules only if TIF_NEED_RESCHED is raised what is not good > here. Because in our case we know that we definitely would like to > take a breath. Therefore > invoking the schedule() is more suitable here. It will give a CPU time > to another waiting > process(if exists) in any case putting the "current" one to the tail. Yes, but there is no explicit event to wait for currently. > As for adding a delay. I am not sure about for how long to delay or i > would say i do not > see a good explanation why for example we delay for 10 milliseconds or so. As I've said I am OK with either of the two. Do you or anybody have any preference? Without any explicit event to wake up for neither of the two is more than just an optimistic retry.
On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: > > On Wed 20-10-21 15:54:23, Uladzislau Rezki wrote: > > > > > > > > > I think adding kind of schedule() will not make things worse and in corner > > > > cases could prevent a power drain by CPU. It is important for mobile devices. > > > > > > I suspect you mean schedule_timeout here? Or cond_resched? I went with a > > > later for now, I do not have a good idea for how to long to sleep here. > > > I am more than happy to change to to a sleep though. > > > > > cond_resched() reschedules only if TIF_NEED_RESCHED is raised what is not good > > here. Because in our case we know that we definitely would like to > > take a breath. Therefore > > invoking the schedule() is more suitable here. It will give a CPU time > > to another waiting > > process(if exists) in any case putting the "current" one to the tail. > > Yes, but there is no explicit event to wait for currently. > > > As for adding a delay. I am not sure about for how long to delay or i > > would say i do not > > see a good explanation why for example we delay for 10 milliseconds or so. > > As I've said I am OK with either of the two. Do you or anybody have any > preference? Without any explicit event to wake up for neither of the two > is more than just an optimistic retry. > From power perspective it is better to have a delay, so i tend to say that delay is better.
On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote: > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: [...] > > As I've said I am OK with either of the two. Do you or anybody have any > > preference? Without any explicit event to wake up for neither of the two > > is more than just an optimistic retry. > > > From power perspective it is better to have a delay, so i tend to say > that delay is better. I am a terrible random number generator. Can you give me a number please?
> > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote: > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: > [...] > > > As I've said I am OK with either of the two. Do you or anybody have any > > > preference? Without any explicit event to wake up for neither of the two > > > is more than just an optimistic retry. > > > > > From power perspective it is better to have a delay, so i tend to say > > that delay is better. > > I am a terrible random number generator. Can you give me a number > please? > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1)
On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote: > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote: > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: > > [...] > > > > As I've said I am OK with either of the two. Do you or anybody have any > > > > preference? Without any explicit event to wake up for neither of the two > > > > is more than just an optimistic retry. > > > > > > > From power perspective it is better to have a delay, so i tend to say > > > that delay is better. > > > > I am a terrible random number generator. Can you give me a number > > please? > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1) > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1)); -- Vlad Rezki
On Wed 20-10-21 21:24:30, Uladzislau Rezki wrote: > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote: > > > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote: > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: > > > [...] > > > > > As I've said I am OK with either of the two. Do you or anybody have any > > > > > preference? Without any explicit event to wake up for neither of the two > > > > > is more than just an optimistic retry. > > > > > > > > > From power perspective it is better to have a delay, so i tend to say > > > > that delay is better. > > > > > > I am a terrible random number generator. Can you give me a number > > > please? > > > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1) OK, I will go with 1 jiffy. > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1)); I have planned to use schedule_timeout_uninterruptible. Why do you think msleep is better?
On Thu, 21 Oct 2021, Uladzislau Rezki wrote: > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote: > > > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote: > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: > > > [...] > > > > > As I've said I am OK with either of the two. Do you or anybody have any > > > > > preference? Without any explicit event to wake up for neither of the two > > > > > is more than just an optimistic retry. > > > > > > > > > From power perspective it is better to have a delay, so i tend to say > > > > that delay is better. > > > > > > I am a terrible random number generator. Can you give me a number > > > please? > > > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1) > > > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1)); I disagree. I think schedule_timeout_uninterruptible(1) is the best wait to sleep for 1 ticl msleep() contains timeout = msecs_to_jiffies(msecs) + 1; and both jiffies_to_msecs and msecs_to_jiffies might round up too. So you will sleep for at least twice as long as you asked for, possible more. NeilBrown > > -- > Vlad Rezki > >
On Thu 21-10-21 21:13:35, Neil Brown wrote: > On Thu, 21 Oct 2021, Uladzislau Rezki wrote: > > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote: > > > > > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote: > > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: > > > > [...] > > > > > > As I've said I am OK with either of the two. Do you or anybody have any > > > > > > preference? Without any explicit event to wake up for neither of the two > > > > > > is more than just an optimistic retry. > > > > > > > > > > > From power perspective it is better to have a delay, so i tend to say > > > > > that delay is better. > > > > > > > > I am a terrible random number generator. Can you give me a number > > > > please? > > > > > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1) > > > > > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1)); > > I disagree. I think schedule_timeout_uninterruptible(1) is the best > wait to sleep for 1 ticl > > msleep() contains > timeout = msecs_to_jiffies(msecs) + 1; > and both jiffies_to_msecs and msecs_to_jiffies might round up too. > So you will sleep for at least twice as long as you asked for, possible > more. That was my thinking as well. Not to mention jiffies_to_msecs just to do msecs_to_jiffies right after which seems like a pointless wasting of cpu cycle. But maybe I was missing some other reasons why msleep would be superior.
> On Thu 21-10-21 21:13:35, Neil Brown wrote: > > On Thu, 21 Oct 2021, Uladzislau Rezki wrote: > > > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote: > > > > > > > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote: > > > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > [...] > > > > > > > As I've said I am OK with either of the two. Do you or anybody have any > > > > > > > preference? Without any explicit event to wake up for neither of the two > > > > > > > is more than just an optimistic retry. > > > > > > > > > > > > > From power perspective it is better to have a delay, so i tend to say > > > > > > that delay is better. > > > > > > > > > > I am a terrible random number generator. Can you give me a number > > > > > please? > > > > > > > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1) > > > > > > > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1)); > > > > I disagree. I think schedule_timeout_uninterruptible(1) is the best > > wait to sleep for 1 ticl > > > > msleep() contains > > timeout = msecs_to_jiffies(msecs) + 1; > > and both jiffies_to_msecs and msecs_to_jiffies might round up too. > > So you will sleep for at least twice as long as you asked for, possible > > more. > > That was my thinking as well. Not to mention jiffies_to_msecs just to do > msecs_to_jiffies right after which seems like a pointless wasting of > cpu cycle. But maybe I was missing some other reasons why msleep would > be superior. > To me the msleep is just more simpler from semantic point of view, i.e. it is as straight forward as it can be. In case of interruptable/uninteraptable sleep it can be more confusing for people. When it comes to rounding and possibility to sleep more than 1 tick, it really does not matter here, we do not need to guarantee exact sleeping time. Therefore i proposed to switch to the msleep(). -- Vlad Rezki
On Thu, 21 Oct 2021, Uladzislau Rezki wrote: > > On Thu 21-10-21 21:13:35, Neil Brown wrote: > > > On Thu, 21 Oct 2021, Uladzislau Rezki wrote: > > > > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote: > > > > > > > > > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote: > > > > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > > [...] > > > > > > > > As I've said I am OK with either of the two. Do you or anybody have any > > > > > > > > preference? Without any explicit event to wake up for neither of the two > > > > > > > > is more than just an optimistic retry. > > > > > > > > > > > > > > > From power perspective it is better to have a delay, so i tend to say > > > > > > > that delay is better. > > > > > > > > > > > > I am a terrible random number generator. Can you give me a number > > > > > > please? > > > > > > > > > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1) > > > > > > > > > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1)); > > > > > > I disagree. I think schedule_timeout_uninterruptible(1) is the best > > > wait to sleep for 1 ticl > > > > > > msleep() contains > > > timeout = msecs_to_jiffies(msecs) + 1; > > > and both jiffies_to_msecs and msecs_to_jiffies might round up too. > > > So you will sleep for at least twice as long as you asked for, possible > > > more. > > > > That was my thinking as well. Not to mention jiffies_to_msecs just to do > > msecs_to_jiffies right after which seems like a pointless wasting of > > cpu cycle. But maybe I was missing some other reasons why msleep would > > be superior. > > > > To me the msleep is just more simpler from semantic point of view, i.e. > it is as straight forward as it can be. In case of interruptable/uninteraptable > sleep it can be more confusing for people. I agree that msleep() is more simple. I think adding the jiffies_to_msec() substantially reduces that simplicity. > > When it comes to rounding and possibility to sleep more than 1 tick, it > really does not matter here, we do not need to guarantee exact sleeping > time. > > Therefore i proposed to switch to the msleep(). If, as you say, the precision doesn't matter that much, then maybe msleep(0) which would sleep to the start of the next jiffy. Does that look a bit weird? If so, the msleep(1) would be ok. However now that I've thought about some more, I'd much prefer we introduce something like memalloc_retry_wait(); and use that everywhere that a memory allocation is retried. I'm not convinced that we need to wait at all - at least, not when __GFP_DIRECT_RECLAIM is used, as in that case alloc_page will either - succeed - make some progress a reclaiming or - sleep However I'm not 100% certain, and the behaviour might change in the future. So having one place (the definition of memalloc_retry_wait()) where we can change the sleeping behaviour if the alloc_page behavour changes, would be ideal. Maybe memalloc_retry_wait() could take a gfpflags arg. Thanks, NeilBrown
On Fri 22-10-21 09:49:08, Neil Brown wrote: [...] > However now that I've thought about some more, I'd much prefer we > introduce something like > memalloc_retry_wait(); > > and use that everywhere that a memory allocation is retried. > I'm not convinced that we need to wait at all - at least, not when > __GFP_DIRECT_RECLAIM is used, as in that case alloc_page will either > - succeed > - make some progress a reclaiming or > - sleep There are two that we have to do explicitly vmap_pages_range one is due to implicit GFP_KERNEL allocations for page tables. Those would likely be a good fit for something you suggest above. Then we have __get_vm_area_node retry loop which can be either due to vmalloc space reservation failure or an implicit GFP_KERNEL allocation by kasan. The first one is not really related to the memory availability so it doesn't sound like a good fit.
On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote: > On Thu, 21 Oct 2021, Uladzislau Rezki wrote: > > > On Thu 21-10-21 21:13:35, Neil Brown wrote: > > > > On Thu, 21 Oct 2021, Uladzislau Rezki wrote: > > > > > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote: > > > > > > > > > > > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote: > > > > > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > [...] > > > > > > > > > As I've said I am OK with either of the two. Do you or anybody have any > > > > > > > > > preference? Without any explicit event to wake up for neither of the two > > > > > > > > > is more than just an optimistic retry. > > > > > > > > > > > > > > > > > From power perspective it is better to have a delay, so i tend to say > > > > > > > > that delay is better. > > > > > > > > > > > > > > I am a terrible random number generator. Can you give me a number > > > > > > > please? > > > > > > > > > > > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1) > > > > > > > > > > > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1)); > > > > > > > > I disagree. I think schedule_timeout_uninterruptible(1) is the best > > > > wait to sleep for 1 ticl > > > > > > > > msleep() contains > > > > timeout = msecs_to_jiffies(msecs) + 1; > > > > and both jiffies_to_msecs and msecs_to_jiffies might round up too. > > > > So you will sleep for at least twice as long as you asked for, possible > > > > more. > > > > > > That was my thinking as well. Not to mention jiffies_to_msecs just to do > > > msecs_to_jiffies right after which seems like a pointless wasting of > > > cpu cycle. But maybe I was missing some other reasons why msleep would > > > be superior. > > > > > > > To me the msleep is just more simpler from semantic point of view, i.e. > > it is as straight forward as it can be. In case of interruptable/uninteraptable > > sleep it can be more confusing for people. > > I agree that msleep() is more simple. I think adding the > jiffies_to_msec() substantially reduces that simplicity. > > > > > When it comes to rounding and possibility to sleep more than 1 tick, it > > really does not matter here, we do not need to guarantee exact sleeping > > time. > > > > Therefore i proposed to switch to the msleep(). > > If, as you say, the precision doesn't matter that much, then maybe > msleep(0) > which would sleep to the start of the next jiffy. Does that look a bit > weird? If so, the msleep(1) would be ok. > Agree, msleep(1) looks much better rather then converting 1 jiffy to milliseconds. Result should be the same. > However now that I've thought about some more, I'd much prefer we > introduce something like > memalloc_retry_wait(); > > and use that everywhere that a memory allocation is retried. > I'm not convinced that we need to wait at all - at least, not when > __GFP_DIRECT_RECLAIM is used, as in that case alloc_page will either > - succeed > - make some progress a reclaiming or > - sleep > > However I'm not 100% certain, and the behaviour might change in the > future. So having one place (the definition of memalloc_retry_wait()) > where we can change the sleeping behaviour if the alloc_page behavour > changes, would be ideal. Maybe memalloc_retry_wait() could take a > gfpflags arg. > At sleeping is required for __get_vm_area_node() because in case of lack of vmap space it will end up in tight loop without sleeping what is really bad. Thanks! -- Vlad Rezki
On Mon 25-10-21 11:48:41, Uladzislau Rezki wrote: > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote: [...] > > If, as you say, the precision doesn't matter that much, then maybe > > msleep(0) > > which would sleep to the start of the next jiffy. Does that look a bit > > weird? If so, the msleep(1) would be ok. > > > Agree, msleep(1) looks much better rather then converting 1 jiffy to > milliseconds. Result should be the same. I would really prefer if this was not the main point of arguing here. Unless you feel strongly about msleep I would go with schedule_timeout here because this is a more widely used interface in the mm code and also because I feel like that relying on the rounding behavior is just subtle. Here is what I have staged now. Are there any other concerns you see with this or other patches in the series? Thanks! --- commit c1a7e40e6b56fed5b9e716de7055b77ea29d89d0 Author: Michal Hocko <mhocko@suse.com> Date: Wed Oct 20 10:12:45 2021 +0200 fold me "mm/vmalloc: add support for __GFP_NOFAIL" Add a short sleep before retrying. 1 jiffy is a completely random timeout. Ideally the retry would wait for an explicit event - e.g. a change to the vmalloc space change if the failure was caused by the space fragmentation or depletion. But there are multiple different reasons to retry and this could become much more complex. Keep the retry simple for now and just sleep to prevent from hogging CPUs. diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 0fb5413d9239..a866db0c9c31 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2944,6 +2944,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, do { ret = vmap_pages_range(addr, addr + size, prot, area->pages, page_shift); + schedule_timeout_uninterruptible(1); } while ((gfp_mask & __GFP_NOFAIL) && (ret < 0)); if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) @@ -3034,8 +3035,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, vm_struct allocation failed", real_size); - if (gfp_mask & __GFP_NOFAIL) + if (gfp_mask & __GFP_NOFAIL) { + schedule_timeout_uninterruptible(1); goto again; + } goto fail; }
> > I would really prefer if this was not the main point of arguing here. > Unless you feel strongly about msleep I would go with schedule_timeout > here because this is a more widely used interface in the mm code and > also because I feel like that relying on the rounding behavior is just > subtle. Here is what I have staged now. > I have a preference but do not have a strong opinion here. You can go either way you want. > > Are there any other concerns you see with this or other patches in the > series? > it is better if you could send a new vX version because it is hard to combine every "folded" into one solid commit. One comment below: > --- > commit c1a7e40e6b56fed5b9e716de7055b77ea29d89d0 > Author: Michal Hocko <mhocko@suse.com> > Date: Wed Oct 20 10:12:45 2021 +0200 > > fold me "mm/vmalloc: add support for __GFP_NOFAIL" > > Add a short sleep before retrying. 1 jiffy is a completely random > timeout. Ideally the retry would wait for an explicit event - e.g. > a change to the vmalloc space change if the failure was caused by > the space fragmentation or depletion. But there are multiple different > reasons to retry and this could become much more complex. Keep the retry > simple for now and just sleep to prevent from hogging CPUs. > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 0fb5413d9239..a866db0c9c31 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2944,6 +2944,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > do { > ret = vmap_pages_range(addr, addr + size, prot, area->pages, > page_shift); > + schedule_timeout_uninterruptible(1); > We do not want to schedule_timeout_uninterruptible(1); every time. Only when an error is detected.
On Mon 25-10-21 16:30:23, Uladzislau Rezki wrote: > > > > I would really prefer if this was not the main point of arguing here. > > Unless you feel strongly about msleep I would go with schedule_timeout > > here because this is a more widely used interface in the mm code and > > also because I feel like that relying on the rounding behavior is just > > subtle. Here is what I have staged now. > > > I have a preference but do not have a strong opinion here. You can go > either way you want. > > > > > Are there any other concerns you see with this or other patches in the > > series? > > > it is better if you could send a new vX version because it is hard to > combine every "folded" Yeah, I plan to soon. I just wanted to sort out most things before spaming with a new version. > into one solid commit. One comment below: > > > --- > > commit c1a7e40e6b56fed5b9e716de7055b77ea29d89d0 > > Author: Michal Hocko <mhocko@suse.com> > > Date: Wed Oct 20 10:12:45 2021 +0200 > > > > fold me "mm/vmalloc: add support for __GFP_NOFAIL" > > > > Add a short sleep before retrying. 1 jiffy is a completely random > > timeout. Ideally the retry would wait for an explicit event - e.g. > > a change to the vmalloc space change if the failure was caused by > > the space fragmentation or depletion. But there are multiple different > > reasons to retry and this could become much more complex. Keep the retry > > simple for now and just sleep to prevent from hogging CPUs. > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 0fb5413d9239..a866db0c9c31 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -2944,6 +2944,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > do { > > ret = vmap_pages_range(addr, addr + size, prot, area->pages, > > page_shift); > > + schedule_timeout_uninterruptible(1); > > > We do not want to schedule_timeout_uninterruptible(1); every time. > Only when an error is detected. Because I was obviously in a brainless mode when doing that one. Thanks for pointing this out!
On Mon, 25 Oct 2021, Uladzislau Rezki wrote: > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote: > > However I'm not 100% certain, and the behaviour might change in the > > future. So having one place (the definition of memalloc_retry_wait()) > > where we can change the sleeping behaviour if the alloc_page behavour > > changes, would be ideal. Maybe memalloc_retry_wait() could take a > > gfpflags arg. > > > At sleeping is required for __get_vm_area_node() because in case of lack > of vmap space it will end up in tight loop without sleeping what is > really bad. > So vmalloc() has two failure modes. alloc_page() failure and __alloc_vmap_area() failure. The caller cannot tell which... Actually, they can. If we pass __GFP_NOFAIL to vmalloc(), and it fails, then it must have been __alloc_vmap_area() which failed. What do we do in that case? Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy() finishes? If we use the spinlock from that waitq in place of free_vmap_area_lock, then the wakeup would be nearly free if no-one was waiting, and worth while if someone was waiting. Thanks, NeilBrown
On Tue 26-10-21 10:50:21, Neil Brown wrote: > On Mon, 25 Oct 2021, Uladzislau Rezki wrote: > > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote: > > > However I'm not 100% certain, and the behaviour might change in the > > > future. So having one place (the definition of memalloc_retry_wait()) > > > where we can change the sleeping behaviour if the alloc_page behavour > > > changes, would be ideal. Maybe memalloc_retry_wait() could take a > > > gfpflags arg. > > > > > At sleeping is required for __get_vm_area_node() because in case of lack > > of vmap space it will end up in tight loop without sleeping what is > > really bad. > > > So vmalloc() has two failure modes. alloc_page() failure and > __alloc_vmap_area() failure. The caller cannot tell which... > > Actually, they can. If we pass __GFP_NOFAIL to vmalloc(), and it fails, > then it must have been __alloc_vmap_area() which failed. > What do we do in that case? > Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy() > finishes? > If we use the spinlock from that waitq in place of free_vmap_area_lock, > then the wakeup would be nearly free if no-one was waiting, and worth > while if someone was waiting. Is this really required to be part of the initial support?
On Tue, 26 Oct 2021, Michal Hocko wrote: > On Tue 26-10-21 10:50:21, Neil Brown wrote: > > On Mon, 25 Oct 2021, Uladzislau Rezki wrote: > > > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote: > > > > However I'm not 100% certain, and the behaviour might change in the > > > > future. So having one place (the definition of memalloc_retry_wait()) > > > > where we can change the sleeping behaviour if the alloc_page behavour > > > > changes, would be ideal. Maybe memalloc_retry_wait() could take a > > > > gfpflags arg. > > > > > > > At sleeping is required for __get_vm_area_node() because in case of lack > > > of vmap space it will end up in tight loop without sleeping what is > > > really bad. > > > > > So vmalloc() has two failure modes. alloc_page() failure and > > __alloc_vmap_area() failure. The caller cannot tell which... > > > > Actually, they can. If we pass __GFP_NOFAIL to vmalloc(), and it fails, > > then it must have been __alloc_vmap_area() which failed. > > What do we do in that case? > > Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy() > > finishes? > > If we use the spinlock from that waitq in place of free_vmap_area_lock, > > then the wakeup would be nearly free if no-one was waiting, and worth > > while if someone was waiting. > > Is this really required to be part of the initial support? No.... I was just thinking out-loud. NeilBrown
On Tue, Oct 26, 2021 at 12:24 PM NeilBrown <neilb@suse.de> wrote: > > On Tue, 26 Oct 2021, Michal Hocko wrote: > > On Tue 26-10-21 10:50:21, Neil Brown wrote: > > > On Mon, 25 Oct 2021, Uladzislau Rezki wrote: > > > > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote: > > > > > However I'm not 100% certain, and the behaviour might change in the > > > > > future. So having one place (the definition of memalloc_retry_wait()) > > > > > where we can change the sleeping behaviour if the alloc_page behavour > > > > > changes, would be ideal. Maybe memalloc_retry_wait() could take a > > > > > gfpflags arg. > > > > > > > > > At sleeping is required for __get_vm_area_node() because in case of lack > > > > of vmap space it will end up in tight loop without sleeping what is > > > > really bad. > > > > > > > So vmalloc() has two failure modes. alloc_page() failure and > > > __alloc_vmap_area() failure. The caller cannot tell which... > > > > > > Actually, they can. If we pass __GFP_NOFAIL to vmalloc(), and it fails, > > > then it must have been __alloc_vmap_area() which failed. > > > What do we do in that case? > > > Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy() > > > finishes? > > > If we use the spinlock from that waitq in place of free_vmap_area_lock, > > > then the wakeup would be nearly free if no-one was waiting, and worth > > > while if someone was waiting. > > > > Is this really required to be part of the initial support? > > No.... I was just thinking out-loud. > alloc_vmap_area() has an retry path, basically if it fails the code will try to "purge" areas and repeat it one more time. So we do not need to purge outside some where else.
On Tue 26-10-21 16:25:07, Uladzislau Rezki wrote: > On Tue, Oct 26, 2021 at 12:24 PM NeilBrown <neilb@suse.de> wrote: > > > > On Tue, 26 Oct 2021, Michal Hocko wrote: > > > On Tue 26-10-21 10:50:21, Neil Brown wrote: > > > > On Mon, 25 Oct 2021, Uladzislau Rezki wrote: > > > > > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote: > > > > > > However I'm not 100% certain, and the behaviour might change in the > > > > > > future. So having one place (the definition of memalloc_retry_wait()) > > > > > > where we can change the sleeping behaviour if the alloc_page behavour > > > > > > changes, would be ideal. Maybe memalloc_retry_wait() could take a > > > > > > gfpflags arg. > > > > > > > > > > > At sleeping is required for __get_vm_area_node() because in case of lack > > > > > of vmap space it will end up in tight loop without sleeping what is > > > > > really bad. > > > > > > > > > So vmalloc() has two failure modes. alloc_page() failure and > > > > __alloc_vmap_area() failure. The caller cannot tell which... > > > > > > > > Actually, they can. If we pass __GFP_NOFAIL to vmalloc(), and it fails, > > > > then it must have been __alloc_vmap_area() which failed. > > > > What do we do in that case? > > > > Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy() > > > > finishes? > > > > If we use the spinlock from that waitq in place of free_vmap_area_lock, > > > > then the wakeup would be nearly free if no-one was waiting, and worth > > > > while if someone was waiting. > > > > > > Is this really required to be part of the initial support? > > > > No.... I was just thinking out-loud. > > > alloc_vmap_area() has an retry path, basically if it fails the code > will try to "purge" > areas and repeat it one more time. So we do not need to purge outside some where > else. I think that Neil was not concerned about the need for purging something but rather a waiting event the retry loop could hook into. So that the sleep wouldn't have to be a random timeout but something that is actually actionable - like somebody freeing an area.
> > > > Is this really required to be part of the initial support? > > > > > > No.... I was just thinking out-loud. > > > > > alloc_vmap_area() has an retry path, basically if it fails the code > > will try to "purge" > > areas and repeat it one more time. So we do not need to purge outside some where > > else. > > I think that Neil was not concerned about the need for purging something > but rather a waiting event the retry loop could hook into. So that the > sleep wouldn't have to be a random timeout but something that is > actually actionable - like somebody freeing an area. > I see this point. But sometimes it is not as straightforward as it could be. If we have lack of vmap space within a specific range, it might be not about reclaiming(nobody frees to that area and no outstanding areas) thus we can do nothing.
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 7455c89598d3..3a5a178295d1 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, else if (!(gfp_mask & (__GFP_FS | __GFP_IO))) flags = memalloc_noio_save(); - ret = vmap_pages_range(addr, addr + size, prot, area->pages, + do { + ret = vmap_pages_range(addr, addr + size, prot, area->pages, page_shift); + } while ((gfp_mask & __GFP_NOFAIL) && (ret < 0)); if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) memalloc_nofs_restore(flags); @@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, vm_struct allocation failed", real_size); + if (gfp_mask && __GFP_NOFAIL) + goto again; goto fail; }