Message ID | 20170109221831.GC3800@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hi Paul, On Mon, Jan 9, 2017 at 11:18 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Mon, Jan 09, 2017 at 10:33:29AM +0100, Borislav Petkov wrote: >> + Paul for comment. >> >> Leaving in the rest for him. >> >> On Mon, Jan 09, 2017 at 02:36:33AM +0000, Zheng, Lv wrote: >> > Hi, >> > >> > > From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Zheng, >> > > Lv >> > > Subject: RE: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and >> > > early_acpi_os_unmap_memory() from Linux kernel >> > > >> > > Hi, >> > > >> > > > From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of >> > > Borislav >> > > > Petkov >> > > > Subject: Re: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and >> > > > early_acpi_os_unmap_memory() from Linux kernel >> > > > >> > > > On Sun, Jan 08, 2017 at 03:20:20AM +0100, Rafael J. Wysocki wrote: >> > > > > drivers/iommu/amd_iommu_init.c | 2 +- >> > > > > 1 file changed, 1 insertion(+), 1 deletion(-) >> > > > > >> > > > > Index: linux-pm/drivers/iommu/amd_iommu_init.c >> > > > > =================================================================== >> > > > > --- linux-pm.orig/drivers/iommu/amd_iommu_init.c >> > > > > +++ linux-pm/drivers/iommu/amd_iommu_init.c >> > > > > @@ -2230,7 +2230,7 @@ static int __init early_amd_iommu_init(v >> > > > > */ >> > > > > ret = check_ivrs_checksum(ivrs_base); >> > > > > if (ret) >> > > > > - return ret; >> > > > > + goto out; >> > > > > >> > > > > amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base); >> > > > > DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type); >> > > > >> > > > Good catch, this one needs to be applied regardless. >> > > > >> > > > However, it doesn't fix my issue though. >> > > > >> > > > But I think I have it - I went and applied the well-proven debugging >> > > > technique of sprinkling printks around. Here's what I'm seeing: >> > > > >> > > > early_amd_iommu_init() >> > > > |-> acpi_put_table(ivrs_base); >> > > > |-> acpi_tb_put_table(table_desc); >> > > > |-> acpi_tb_invalidate_table(table_desc); >> > > > |-> acpi_tb_release_table(...) >> > > > |-> acpi_os_unmap_memory >> > > > |-> acpi_os_unmap_iomem >> > > > |-> acpi_os_map_cleanup >> > > > |-> synchronize_rcu_expedited <-- the kernel/rcu/tree_exp.h version with CONFIG_PREEMPT_RCU=y >> > > > >> > > > Now that function goes and sends IPIs, i.e., schedule_work() >> > > > but this is too early - we haven't even done workqueue_init(). >> > > > Actually, from looking at the callstack, we do >> > > > kernel_init_freeable->native_smp_prepare_cpus() and workqueue_init() >> > > > comes next. >> > > > >> > > > And this makes sense because the splat rIP points to __queue_work() but >> > > > we haven't done that yet. >> > > > >> > > > So that acpi_put_table() is happening too early. Looks like AMD IOMMU >> > > > should not put the table but WTH do I know?! >> > > > >> > > > In any case, commenting out: >> > > > >> > > > acpi_put_table(ivrs_base); >> > > > ivrs_base = NULL; >> > > > >> > > > and the end of early_amd_iommu_init() makes the box boot again. >> > > >> > > So please help to comment out these 2 lines (with descriptions and do not delete them). >> > > Until acpi_os_unmap_memory() is able to handle such an early case. >> > >> > IMO, synchronize_rcu_expedited() should be improved: >> > If rcu_init() isn't called or there is nothing to synchronize, schedule_work() shouldn't be invoked. > > Indeed it should! > > Does the (untested) patch below fix things for you? > > If so, does this need to go into 4.10? (My default workflow would get > it into 4.11 or 4.12, so please speak up if you need it.) Yes it should go into 4.10 (if it fixes the problem) as the reported regression was introduced in 4.10-rc1. > > ------------------------------------------------------------------------ > > commit 1b7feb708241f1662cfd529118468c9f9c0b1449 > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Mon Jan 9 14:10:50 2017 -0800 > > rcu: Make synchronize_rcu_expedited() safe for early boot > > The synchronize_rcu_expedited() function does not check for early-boot > use, which can result in failures if it is invoked before the scheduler > has started. Given that the rcupdate.rcu_expedited kernel parameter > causes all calls to synchronize_rcu() to be directed instead to > synchronize_rcu_expedited(), a usage restriction does not make sense. > > This commit therefore adds a rcu_scheduler_active check to > synchronize_rcu_expedited(), so that it is a no-op before the scheduler > starts. This behavior is correct because there is only a single CPU > running during that time. > > Reported-by: Lv Zheng <lv.zheng@intel.com> > Reported-by: Borislav Petkov <bp@alien8.de> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> When applying this, please also add: Fixes: 174cc7187e6f ("ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel") Link: https://lkml.kernel.org/r/4034dde8-ffc1-18e2-f40c-00cf37471793@intel.com > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index dfc3ba5a429e..a6c3d86480de 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -690,6 +690,8 @@ void synchronize_rcu_expedited(void) > { > struct rcu_state *rsp = rcu_state_p; > > + if (!rcu_scheduler_active) > + return; > _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler); > } > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 09, 2017 at 02:18:31PM -0800, Paul E. McKenney wrote: > @@ -690,6 +690,8 @@ void synchronize_rcu_expedited(void) > { > struct rcu_state *rsp = rcu_state_p; > > + if (!rcu_scheduler_active) > + return; > _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler); > } > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); That doesn't work and it is because of those damn what goes before what boot sequence issues :-\ We have: rest_init() |-> rcu_scheduler_starting() ---> that sets rcu_scheduler_active = 1; |-> kernel_thread(kernel_init, NULL, CLONE_FS); |-> kernel_init() |-> kernel_init_freeable() |-> native_smp_prepare_cpus(setup_max_cpus) |-> default_setup_apic_routing |-> enable_IR_x2apic |-> irq_remapping_prepare |-> amd_iommu_prepare |-> iommu_go_to_state |-> acpi_put_table(ivrs_base); |-> acpi_tb_put_table(table_desc); |-> acpi_tb_invalidate_table(table_desc); |-> acpi_tb_release_table(...) |-> acpi_os_unmap_memory |-> acpi_os_unmap_iomem |-> acpi_os_map_cleanup |-> synchronize_rcu_expedited() Now here we have rcu_scheduler_active already set so the test doesn't hit and we hang. So we must do it differently.
On Tue, Jan 10, 2017 at 12:15:01AM +0100, Borislav Petkov wrote: > On Mon, Jan 09, 2017 at 02:18:31PM -0800, Paul E. McKenney wrote: > > @@ -690,6 +690,8 @@ void synchronize_rcu_expedited(void) > > { > > struct rcu_state *rsp = rcu_state_p; > > > > + if (!rcu_scheduler_active) > > + return; > > _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler); > > } > > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); > > That doesn't work and it is because of those damn what goes before what > boot sequence issues :-\ > > We have: > > rest_init() > |-> rcu_scheduler_starting() ---> that sets rcu_scheduler_active = 1; > |-> kernel_thread(kernel_init, NULL, CLONE_FS); > |-> kernel_init() > |-> kernel_init_freeable() > |-> native_smp_prepare_cpus(setup_max_cpus) > |-> default_setup_apic_routing > |-> enable_IR_x2apic > |-> irq_remapping_prepare > |-> amd_iommu_prepare > |-> iommu_go_to_state > |-> acpi_put_table(ivrs_base); > |-> acpi_tb_put_table(table_desc); > |-> acpi_tb_invalidate_table(table_desc); > |-> acpi_tb_release_table(...) > |-> acpi_os_unmap_memory > |-> acpi_os_unmap_iomem > |-> acpi_os_map_cleanup > |-> synchronize_rcu_expedited() > > Now here we have rcu_scheduler_active already set so the test doesn't > hit and we hang. > > So we must do it differently. Yeah, there is a window just as the scheduler is starting where things don't work. We could move rcu_scheduler_starting() later, as long as there is no chance of preemption or context switch before it is invoked. Would that help in this case, or are we already context switching before acpi_os_map_cleanup() is invoked? (If we are already context switching, short-circuiting synchronize_rcu_expedited() would be a bug.) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 10, 2017 at 12:32 AM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Tue, Jan 10, 2017 at 12:15:01AM +0100, Borislav Petkov wrote: >> On Mon, Jan 09, 2017 at 02:18:31PM -0800, Paul E. McKenney wrote: >> > @@ -690,6 +690,8 @@ void synchronize_rcu_expedited(void) >> > { >> > struct rcu_state *rsp = rcu_state_p; >> > >> > + if (!rcu_scheduler_active) >> > + return; >> > _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler); >> > } >> > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); >> >> That doesn't work and it is because of those damn what goes before what >> boot sequence issues :-\ >> >> We have: >> >> rest_init() >> |-> rcu_scheduler_starting() ---> that sets rcu_scheduler_active = 1; >> |-> kernel_thread(kernel_init, NULL, CLONE_FS); >> |-> kernel_init() >> |-> kernel_init_freeable() >> |-> native_smp_prepare_cpus(setup_max_cpus) >> |-> default_setup_apic_routing >> |-> enable_IR_x2apic >> |-> irq_remapping_prepare >> |-> amd_iommu_prepare >> |-> iommu_go_to_state >> |-> acpi_put_table(ivrs_base); >> |-> acpi_tb_put_table(table_desc); >> |-> acpi_tb_invalidate_table(table_desc); >> |-> acpi_tb_release_table(...) >> |-> acpi_os_unmap_memory >> |-> acpi_os_unmap_iomem >> |-> acpi_os_map_cleanup >> |-> synchronize_rcu_expedited() >> >> Now here we have rcu_scheduler_active already set so the test doesn't >> hit and we hang. >> >> So we must do it differently. > > Yeah, there is a window just as the scheduler is starting where things don't > work. > > We could move rcu_scheduler_starting() later, as long as there > is no chance of preemption or context switch before it is invoked. > Would that help in this case, or are we already context switching before > acpi_os_map_cleanup() is invoked? In the particular AMD IOMMU case it doesn't look like we are, but we do in other cases. > (If we are already context switching, > short-circuiting synchronize_rcu_expedited() would be a bug.) It may be easier to make the caller avoid RCU synchronization altogether if that's not necessary and the caller should actually be able to figure out when that's the case. The patch from Lv at https://patchwork.kernel.org/patch/9504277/ goes in the right direction IMO, but I'm not yet convinced that this is the right one. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 10, 2017 at 12:42:47AM +0100, Rafael J. Wysocki wrote: > On Tue, Jan 10, 2017 at 12:32 AM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Tue, Jan 10, 2017 at 12:15:01AM +0100, Borislav Petkov wrote: > >> On Mon, Jan 09, 2017 at 02:18:31PM -0800, Paul E. McKenney wrote: > >> > @@ -690,6 +690,8 @@ void synchronize_rcu_expedited(void) > >> > { > >> > struct rcu_state *rsp = rcu_state_p; > >> > > >> > + if (!rcu_scheduler_active) > >> > + return; > >> > _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler); > >> > } > >> > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); > >> > >> That doesn't work and it is because of those damn what goes before what > >> boot sequence issues :-\ > >> > >> We have: > >> > >> rest_init() > >> |-> rcu_scheduler_starting() ---> that sets rcu_scheduler_active = 1; > >> |-> kernel_thread(kernel_init, NULL, CLONE_FS); > >> |-> kernel_init() > >> |-> kernel_init_freeable() > >> |-> native_smp_prepare_cpus(setup_max_cpus) > >> |-> default_setup_apic_routing > >> |-> enable_IR_x2apic > >> |-> irq_remapping_prepare > >> |-> amd_iommu_prepare > >> |-> iommu_go_to_state > >> |-> acpi_put_table(ivrs_base); > >> |-> acpi_tb_put_table(table_desc); > >> |-> acpi_tb_invalidate_table(table_desc); > >> |-> acpi_tb_release_table(...) > >> |-> acpi_os_unmap_memory > >> |-> acpi_os_unmap_iomem > >> |-> acpi_os_map_cleanup > >> |-> synchronize_rcu_expedited() > >> > >> Now here we have rcu_scheduler_active already set so the test doesn't > >> hit and we hang. > >> > >> So we must do it differently. > > > > Yeah, there is a window just as the scheduler is starting where things don't > > work. > > > > We could move rcu_scheduler_starting() later, as long as there > > is no chance of preemption or context switch before it is invoked. > > Would that help in this case, or are we already context switching before > > acpi_os_map_cleanup() is invoked? > > In the particular AMD IOMMU case it doesn't look like we are, but we > do in other cases. > > > (If we are already context switching, > > short-circuiting synchronize_rcu_expedited() would be a bug.) > > It may be easier to make the caller avoid RCU synchronization > altogether if that's not necessary and the caller should actually be > able to figure out when that's the case. > > The patch from Lv at https://patchwork.kernel.org/patch/9504277/ goes > in the right direction IMO, but I'm not yet convinced that this is the > right one. From the RCU end, I could force expedited grace periods to translate to normal grace periods during that window of time, and then make sure that RCU's grace-period kthreads are spawned beforehand. Looking into this... Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index dfc3ba5a429e..a6c3d86480de 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -690,6 +690,8 @@ void synchronize_rcu_expedited(void) { struct rcu_state *rsp = rcu_state_p; + if (!rcu_scheduler_active) + return; _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler); } EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);