diff mbox series

[2/3] scsi: arcmsr: Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2

Message ID 1547696703.4339.21.camel@Centos6.3-64 (mailing list archive)
State Superseded
Headers show
Series [1/3] scsi: arcmsr: Use dma_alloc_coherent to replace dma_zalloc_coherent | expand

Commit Message

ching Huang Jan. 17, 2019, 3:45 a.m. UTC
From Ching Huang <ching2048@areca.com.tw>

Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.

Signed-off-by: Ching Huang <ching2048@areca.com.tw>
---

Comments

Dan Carpenter Jan. 17, 2019, 7:59 a.m. UTC | #1
On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> >From Ching Huang <ching2048@areca.com.tw>
> 
> Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> 

What does this look like from a user perspective?  Does it fail every
time or does it only fail sometimes?

What's the bug exactly?

There is no Fixes tag...

> Signed-off-by: Ching Huang <ching2048@areca.com.tw>
> ---
> 
> diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> index a94c513..b98c632 100755
> --- a/drivers/scsi/arcmsr/arcmsr.h
> +++ b/drivers/scsi/arcmsr/arcmsr.h
> @@ -508,9 +508,9 @@ struct MessageUnit_A
>  struct MessageUnit_B
>  {
>  	uint32_t	post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> -	uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> +	volatile uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];

There is a well known rule of thumb that when someone uses "volatile"
in the kernel it means there is a locking problem...  Is this __iomem or
something?

>  	uint32_t	postq_index;
> -	uint32_t	doneq_index;
> +	volatile uint32_t	doneq_index;
>  	uint32_t	__iomem *drv2iop_doorbell;
>  	uint32_t	__iomem *drv2iop_doorbell_mask;
>  	uint32_t	__iomem *iop2drv_doorbell;
> diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
> index 5736434..88053b1 100755
> --- a/drivers/scsi/arcmsr/arcmsr_hba.c
> +++ b/drivers/scsi/arcmsr/arcmsr_hba.c
> @@ -1113,7 +1113,11 @@ static int arcmsr_resume(struct pci_dev *pdev)
>  	switch (acb->adapter_type) {
>  	case ACB_ADAPTER_TYPE_B: {
>  		struct MessageUnit_B *reg = acb->pmuB;
> -		reg->post_qbuffer[0] = 0;
> +		uint32_t i;
> +		for (i = 0; i < ARCMSR_MAX_HBB_POSTQUEUE; i++) {
> +			reg->post_qbuffer[i] = 0;
> +			reg->done_qbuffer[i] = 0;
> +		}

Is this cause by patch 1 changing the zalloc to regular alloc??  If so
then it should be folded into that patch instead of sent separately.

regards,
dan carpenter
ching Huang Jan. 17, 2019, 8:47 a.m. UTC | #2
On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > >From Ching Huang <ching2048@areca.com.tw>
> > 
> > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> > 
> 
> What does this look like from a user perspective?  Does it fail every
> time or does it only fail sometimes?
> 
> What's the bug exactly?
> 
> There is no Fixes tag...
From user's perspective, hibernate/resume are OK.
But following IO may cause 'isr get an illegal ccb command' in
log/messages sometime.
> 
> > Signed-off-by: Ching Huang <ching2048@areca.com.tw>
> > ---
> > 
> > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > index a94c513..b98c632 100755
> > --- a/drivers/scsi/arcmsr/arcmsr.h
> > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > @@ -508,9 +508,9 @@ struct MessageUnit_A
> >  struct MessageUnit_B
> >  {
> >  	uint32_t	post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > -	uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > +	volatile uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> 
> There is a well known rule of thumb that when someone uses "volatile"
> in the kernel it means there is a locking problem...  Is this __iomem or
> something?
The done_qbuffer was a command completion queue, it was an area written
by IO processor and read by device driver. So, ...
> 
> >  	uint32_t	postq_index;
> > -	uint32_t	doneq_index;
> > +	volatile uint32_t	doneq_index;
> >  	uint32_t	__iomem *drv2iop_doorbell;
> >  	uint32_t	__iomem *drv2iop_doorbell_mask;
> >  	uint32_t	__iomem *iop2drv_doorbell;
> > diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
> > index 5736434..88053b1 100755
> > --- a/drivers/scsi/arcmsr/arcmsr_hba.c
> > +++ b/drivers/scsi/arcmsr/arcmsr_hba.c
> > @@ -1113,7 +1113,11 @@ static int arcmsr_resume(struct pci_dev *pdev)
> >  	switch (acb->adapter_type) {
> >  	case ACB_ADAPTER_TYPE_B: {
> >  		struct MessageUnit_B *reg = acb->pmuB;
> > -		reg->post_qbuffer[0] = 0;
> > +		uint32_t i;
> > +		for (i = 0; i < ARCMSR_MAX_HBB_POSTQUEUE; i++) {
> > +			reg->post_qbuffer[i] = 0;
> > +			reg->done_qbuffer[i] = 0;
> > +		}
> 
> Is this cause by patch 1 changing the zalloc to regular alloc??  If so
> then it should be folded into that patch instead of sent separately.
These fully clear delivery and completion queues are for fixing 
'isr get an illegal ccb command'. It is nothing related to Zalloc or alloc.
> 
> regards,
> dan carpenter
> 
>
Dan Carpenter Jan. 17, 2019, 9:16 a.m. UTC | #3
On Thu, Jan 17, 2019 at 04:47:07PM +0800, Ching Huang wrote:
> On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> > On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > > >From Ching Huang <ching2048@areca.com.tw>
> > > 
> > > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> > > 
> > 
> > What does this look like from a user perspective?  Does it fail every
> > time or does it only fail sometimes?
> > 
> > What's the bug exactly?
> > 
> > There is no Fixes tag...
> >From user's perspective, hibernate/resume are OK.
> But following IO may cause 'isr get an illegal ccb command' in
> log/messages sometime.
> > 


You will need to resend with that information included in the commit
message.

> > > Signed-off-by: Ching Huang <ching2048@areca.com.tw>
> > > ---
> > > 
> > > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > > index a94c513..b98c632 100755
> > > --- a/drivers/scsi/arcmsr/arcmsr.h
> > > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > > @@ -508,9 +508,9 @@ struct MessageUnit_A
> > >  struct MessageUnit_B
> > >  {
> > >  	uint32_t	post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > -	uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > +	volatile uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > 
> > There is a well known rule of thumb that when someone uses "volatile"
> > in the kernel it means there is a locking problem...  Is this __iomem or
> > something?
> The done_qbuffer was a command completion queue, it was an area written
> by IO processor and read by device driver. So, ...

I'm not totally positive I understand this sentence.  I can find a bunch
of places which read from this buffer, but I haven't immediately found
which place writes to it.  Can you give me a function name that I should
read?

> > 
> > >  	uint32_t	postq_index;
> > > -	uint32_t	doneq_index;
> > > +	volatile uint32_t	doneq_index;

The volatile here is not right.  It's just normal memory.

regards,
dan carpenter
ching Huang Jan. 17, 2019, 9:52 a.m. UTC | #4
On Thu, 2019-01-17 at 12:16 +0300, Dan Carpenter wrote:
> On Thu, Jan 17, 2019 at 04:47:07PM +0800, Ching Huang wrote:
> > On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> > > On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > > > >From Ching Huang <ching2048@areca.com.tw>
> > > > 
> > > > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> > > > 
> > > 
> > > What does this look like from a user perspective?  Does it fail every
> > > time or does it only fail sometimes?
> > > 
> > > What's the bug exactly?
> > > 
> > > There is no Fixes tag...
> > >From user's perspective, hibernate/resume are OK.
> > But following IO may cause 'isr get an illegal ccb command' in
> > log/messages sometime.
> > > 
> 
> 
> You will need to resend with that information included in the commit
> message.
OK. I will resend this patch later.
> 
> > > > Signed-off-by: Ching Huang <ching2048@areca.com.tw>
> > > > ---
> > > > 
> > > > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > > > index a94c513..b98c632 100755
> > > > --- a/drivers/scsi/arcmsr/arcmsr.h
> > > > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > > > @@ -508,9 +508,9 @@ struct MessageUnit_A
> > > >  struct MessageUnit_B
> > > >  {
> > > >  	uint32_t	post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > -	uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > +	volatile uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > 
> > > There is a well known rule of thumb that when someone uses "volatile"
> > > in the kernel it means there is a locking problem...  Is this __iomem or
> > > something?
> > The done_qbuffer was a command completion queue, it was an area written
> > by IO processor and read by device driver. So, ...
> 
> I'm not totally positive I understand this sentence.  I can find a bunch
> of places which read from this buffer, but I haven't immediately found
> which place writes to it.  Can you give me a function name that I should
> read?
Well, we allocate memory for struct MessageUnit_B in
arcmsr_alloc_ccb_pool(), by assign to acb->dma_coherent_handle2.
Then we tell IO controller its DMA address in arcmsr_iop_confirm().
When a command was completed, controller's firmware program will write a
completion ccb in done_qbuffer through DMA. So, you can't see any driver
funtion write to it.
> 
> > > 
> > > >  	uint32_t	postq_index;
> > > > -	uint32_t	doneq_index;
> > > > +	volatile uint32_t	doneq_index;
> 
> The volatile here is not right.  It's just normal memory.
Right. this volatile is not necessary.
> 
> regards,
> dan carpenter
ching Huang Jan. 22, 2019, 12:03 a.m. UTC | #5
On Tue, 2019-01-22 at 10:48 +0300, Dan Carpenter wrote:
> On Thu, Jan 17, 2019 at 05:52:28PM +0800, Ching Huang wrote:
> > On Thu, 2019-01-17 at 12:16 +0300, Dan Carpenter wrote:
> > > On Thu, Jan 17, 2019 at 04:47:07PM +0800, Ching Huang wrote:
> > > > On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> > > > > On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > > > > > >From Ching Huang <ching2048@areca.com.tw>
> > > > > > 
> > > > > > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> > > > > > 
> > > > > 
> > > > > What does this look like from a user perspective?  Does it fail every
> > > > > time or does it only fail sometimes?
> > > > > 
> > > > > What's the bug exactly?
> > > > > 
> > > > > There is no Fixes tag...
> > > > >From user's perspective, hibernate/resume are OK.
> > > > But following IO may cause 'isr get an illegal ccb command' in
> > > > log/messages sometime.
> > > > > 
> > > 
> > > 
> > > You will need to resend with that information included in the commit
> > > message.
> > OK. I will resend this patch later.
> > > 
> > > > > > Signed-off-by: Ching Huang <ching2048@areca.com.tw>
> > > > > > ---
> > > > > > 
> > > > > > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > > > > > index a94c513..b98c632 100755
> > > > > > --- a/drivers/scsi/arcmsr/arcmsr.h
> > > > > > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > > > > > @@ -508,9 +508,9 @@ struct MessageUnit_A
> > > > > >  struct MessageUnit_B
> > > > > >  {
> > > > > >  	uint32_t	post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > > > -	uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > > > +	volatile uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > > 
> > > > > There is a well known rule of thumb that when someone uses "volatile"
> > > > > in the kernel it means there is a locking problem...  Is this __iomem or
> > > > > something?
> > > > The done_qbuffer was a command completion queue, it was an area written
> > > > by IO processor and read by device driver. So, ...
> > > 
> > > I'm not totally positive I understand this sentence.  I can find a bunch
> > > of places which read from this buffer, but I haven't immediately found
> > > which place writes to it.  Can you give me a function name that I should
> > > read?
> > Well, we allocate memory for struct MessageUnit_B in
> > arcmsr_alloc_ccb_pool(), by assign to acb->dma_coherent_handle2.
> > Then we tell IO controller its DMA address in arcmsr_iop_confirm().
> > When a command was completed, controller's firmware program will write a
> > completion ccb in done_qbuffer through DMA. So, you can't see any driver
> > funtion write to it.
> 
> DMA memory doesn't need to be marked as volatile.
I see. So I have removed the volatile in patch v2.
> 
> regards,
> dan carpenter
>
Dan Carpenter Jan. 22, 2019, 7:48 a.m. UTC | #6
On Thu, Jan 17, 2019 at 05:52:28PM +0800, Ching Huang wrote:
> On Thu, 2019-01-17 at 12:16 +0300, Dan Carpenter wrote:
> > On Thu, Jan 17, 2019 at 04:47:07PM +0800, Ching Huang wrote:
> > > On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> > > > On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > > > > >From Ching Huang <ching2048@areca.com.tw>
> > > > > 
> > > > > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> > > > > 
> > > > 
> > > > What does this look like from a user perspective?  Does it fail every
> > > > time or does it only fail sometimes?
> > > > 
> > > > What's the bug exactly?
> > > > 
> > > > There is no Fixes tag...
> > > >From user's perspective, hibernate/resume are OK.
> > > But following IO may cause 'isr get an illegal ccb command' in
> > > log/messages sometime.
> > > > 
> > 
> > 
> > You will need to resend with that information included in the commit
> > message.
> OK. I will resend this patch later.
> > 
> > > > > Signed-off-by: Ching Huang <ching2048@areca.com.tw>
> > > > > ---
> > > > > 
> > > > > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > > > > index a94c513..b98c632 100755
> > > > > --- a/drivers/scsi/arcmsr/arcmsr.h
> > > > > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > > > > @@ -508,9 +508,9 @@ struct MessageUnit_A
> > > > >  struct MessageUnit_B
> > > > >  {
> > > > >  	uint32_t	post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > > -	uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > > +	volatile uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > 
> > > > There is a well known rule of thumb that when someone uses "volatile"
> > > > in the kernel it means there is a locking problem...  Is this __iomem or
> > > > something?
> > > The done_qbuffer was a command completion queue, it was an area written
> > > by IO processor and read by device driver. So, ...
> > 
> > I'm not totally positive I understand this sentence.  I can find a bunch
> > of places which read from this buffer, but I haven't immediately found
> > which place writes to it.  Can you give me a function name that I should
> > read?
> Well, we allocate memory for struct MessageUnit_B in
> arcmsr_alloc_ccb_pool(), by assign to acb->dma_coherent_handle2.
> Then we tell IO controller its DMA address in arcmsr_iop_confirm().
> When a command was completed, controller's firmware program will write a
> completion ccb in done_qbuffer through DMA. So, you can't see any driver
> funtion write to it.

DMA memory doesn't need to be marked as volatile.

regards,
dan carpenter
diff mbox series

Patch

diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
index a94c513..b98c632 100755
--- a/drivers/scsi/arcmsr/arcmsr.h
+++ b/drivers/scsi/arcmsr/arcmsr.h
@@ -508,9 +508,9 @@  struct MessageUnit_A
 struct MessageUnit_B
 {
 	uint32_t	post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
-	uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
+	volatile uint32_t	done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
 	uint32_t	postq_index;
-	uint32_t	doneq_index;
+	volatile uint32_t	doneq_index;
 	uint32_t	__iomem *drv2iop_doorbell;
 	uint32_t	__iomem *drv2iop_doorbell_mask;
 	uint32_t	__iomem *iop2drv_doorbell;
diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
index 5736434..88053b1 100755
--- a/drivers/scsi/arcmsr/arcmsr_hba.c
+++ b/drivers/scsi/arcmsr/arcmsr_hba.c
@@ -1113,7 +1113,11 @@  static int arcmsr_resume(struct pci_dev *pdev)
 	switch (acb->adapter_type) {
 	case ACB_ADAPTER_TYPE_B: {
 		struct MessageUnit_B *reg = acb->pmuB;
-		reg->post_qbuffer[0] = 0;
+		uint32_t i;
+		for (i = 0; i < ARCMSR_MAX_HBB_POSTQUEUE; i++) {
+			reg->post_qbuffer[i] = 0;
+			reg->done_qbuffer[i] = 0;
+		}
 		reg->postq_index = 0;
 		reg->doneq_index = 0;
 		break;