Message ID | 1457710143-29182-2-git-send-email-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Fri, Mar 11, 2016 at 04:28:53PM +0100, Sebastian Andrzej Siewior wrote: > for_each_possible_cpu() with a cpu_online() + `thread' check possibly does > the job. But there is a tiny race: Say CPU5 is reported online but is > going down. And after fcoe_percpu_clean() saw that CPU5 is online it > decided to enqueue a packet. After dev_alloc_skb() returned a skb > that CPU is offline (or say the notifier destroyed the kthread). So we > would OOps because `thread' is NULL. > An alternative would be to lock the CPUs during our loop (so no CPU is > going away) and then we iterate over the online mask. I've looked over this and the following patches, and I suspect the right thing to do for fcoe and bnx2 is to convert them to use the generic workqueue code instead of reinventing it poorly. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/11/2016 05:17 PM, Christoph Hellwig wrote: > On Fri, Mar 11, 2016 at 04:28:53PM +0100, Sebastian Andrzej Siewior wrote: >> for_each_possible_cpu() with a cpu_online() + `thread' check possibly does >> the job. But there is a tiny race: Say CPU5 is reported online but is >> going down. And after fcoe_percpu_clean() saw that CPU5 is online it >> decided to enqueue a packet. After dev_alloc_skb() returned a skb >> that CPU is offline (or say the notifier destroyed the kthread). So we >> would OOps because `thread' is NULL. >> An alternative would be to lock the CPUs during our loop (so no CPU is >> going away) and then we iterate over the online mask. > > I've looked over this and the following patches, and I suspect > the right thing to do for fcoe and bnx2 is to convert them to use the > generic workqueue code instead of reinventing it poorly. alloc_workqueue() in setup and then queue_work_on(cpu, , item)? item should be struct work_struct but all I have is a skb. Is there an easy way to get this attached? Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 11, 2016 at 05:32:15PM +0100, Sebastian Andrzej Siewior wrote: > alloc_workqueue() in setup and then queue_work_on(cpu, , item)? item > should be struct work_struct but all I have is a skb. Is there an easy > way to get this attached? Good question. There is skb->cb, but it looks like it doesn't have space for an additional work_item in the fcoe case. Maybe have a per-cpu work_struct and keep all the list handling as-is for now? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/15/2016 09:19 AM, Christoph Hellwig wrote: > On Fri, Mar 11, 2016 at 05:32:15PM +0100, Sebastian Andrzej Siewior wrote: >> alloc_workqueue() in setup and then queue_work_on(cpu, , item)? item >> should be struct work_struct but all I have is a skb. Is there an easy >> way to get this attached? > > Good question. There is skb->cb, but it looks like it doesn't have > space for an additional work_item in the fcoe case. Maybe have > a per-cpu work_struct and keep all the list handling as-is for now? Okay. Let me try this. What about the few fixes from the series (which apply before the rework to smbboot theads)? Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/08/2016 03:30 PM, Sebastian Andrzej Siewior wrote: > On 03/15/2016 09:19 AM, Christoph Hellwig wrote: >> On Fri, Mar 11, 2016 at 05:32:15PM +0100, Sebastian Andrzej Siewior wrote: >>> alloc_workqueue() in setup and then queue_work_on(cpu, , item)? item >>> should be struct work_struct but all I have is a skb. Is there an easy >>> way to get this attached? >> >> Good question. There is skb->cb, but it looks like it doesn't have >> space for an additional work_item in the fcoe case. Maybe have >> a per-cpu work_struct and keep all the list handling as-is for now? > > Okay. Let me try this. What about the few fixes from the series (which > apply before the rework to smbboot theads)? okay kworker. This does not look good. I have it converted what I miss flushing work when CPU goes down and ensuring not to queue work while the CPU is down. - cpu_online(x) is racy. In DOWN_PREPARE the worker is deactivated / stopped. However slightly later the bit from the CPU mask is removed. - Whatever is queued and did not make it before the CPU went down seems to be delayed until the CPU comes back online. - if the worker keeps running while the CPU is going down the worker continues running on a different CPU. So I don't see how the former two points can be solved without keeping track of CPUs in a CPU notifier. Getting pushed to a different CPU be probably less of an issue if we would have a work-item and would not need to rely on the per-CPU list. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c index 0efe7112fc1f..2b0d207f4b2b 100644 --- a/drivers/scsi/fcoe/fcoe.c +++ b/drivers/scsi/fcoe/fcoe.c @@ -2461,12 +2461,10 @@ static void fcoe_percpu_clean(struct fc_lport *lport) struct sk_buff *skb; unsigned int cpu; - for_each_possible_cpu(cpu) { + get_online_cpus(); + for_each_online_cpu(cpu) { pp = &per_cpu(fcoe_percpu, cpu); - if (!pp->thread || !cpu_online(cpu)) - continue; - skb = dev_alloc_skb(0); if (!skb) continue; @@ -2481,6 +2479,7 @@ static void fcoe_percpu_clean(struct fc_lport *lport) wait_for_completion(&fcoe_flush_completion); } + put_online_cpus(); } /**
for_each_possible_cpu() with a cpu_online() + `thread' check possibly does the job. But there is a tiny race: Say CPU5 is reported online but is going down. And after fcoe_percpu_clean() saw that CPU5 is online it decided to enqueue a packet. After dev_alloc_skb() returned a skb that CPU is offline (or say the notifier destroyed the kthread). So we would OOps because `thread' is NULL. An alternative would be to lock the CPUs during our loop (so no CPU is going away) and then we iterate over the online mask. Cc: Vasu Dev <vasu.dev@intel.com> Cc: "James E.J. Bottomley" <JBottomley@odin.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: fcoe-devel@open-fcoe.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- drivers/scsi/fcoe/fcoe.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)