mmc: core: add an error log for cid verification

Message ID	1550733306-9131-1-git-send-email-hongjiefang@asrmicro.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-mmc-owner@kernel.org> From: hongjiefang <hongjiefang@asrmicro.com> To: <ulf.hansson@linaro.org>, <shawn.lin@rock-chips.com>, <horms+renesas@verge.net.au>, <yinbo.zhu@nxp.com>, <harish_kandiga@mentor.com>, <kyle.roeschley@ni.com> CC: <linux-mmc@vger.kernel.org>, hongjiefang <hongjiefang@asrmicro.com> Subject: [PATCH] mmc: core: add an error log for cid verification Date: Thu, 21 Feb 2019 15:15:06 +0800 Message-ID: <1550733306-9131-1-git-send-email-hongjiefang@asrmicro.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-mmc-owner@vger.kernel.org Precedence: bulk
Series	mmc: core: add an error log for cid verification \| expand mmc: core: add an error log for cid verification

Hongjie Fang Feb. 21, 2019, 7:15 a.m. UTC

when failed to verify the cid, the card initialisation will abort.
it is necessary to print some logs for this critical error.

Signed-off-by: hongjiefang <hongjiefang@asrmicro.com>
---
 drivers/mmc/core/mmc.c | 2 ++
 1 file changed, 2 insertions(+)

Ulf Hansson Feb. 27, 2019, 9:38 a.m. UTC | #1

On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com> wrote:
>
> when failed to verify the cid, the card initialisation will abort.
> it is necessary to print some logs for this critical error.

Under what circumstances do you observe this error? Is it during
system resume or during runtime resume?

Moreover, to get this error, I guess you are remove one card and
replacing it with another, right?

Just wanted to be sure of what kind of scenario you are
testing/looking at. Perhaps you could even fold in some of that
information in the changelog, that would be nice.

>
> Signed-off-by: hongjiefang <hongjiefang@asrmicro.com>
> ---
>  drivers/mmc/core/mmc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
> index da892a5..339015c 100644
> --- a/drivers/mmc/core/mmc.c
> +++ b/drivers/mmc/core/mmc.c
> @@ -1595,6 +1595,8 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr,
>         if (oldcard) {
>                 if (memcmp(cid, oldcard->raw_cid, sizeof(cid)) != 0) {
>                         err = -ENOENT;
> +                       pr_err("%s: Failed to verify the cid, error %d\n",
> +                               mmc_hostname(host), err);
>                         goto err;
>                 }
>
> --
> 1.9.1
>

Kind regards
Uffe

Hongjie Fang Feb. 27, 2019, 2:09 p.m. UTC | #2

> On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com> wrote:
> >
> > when failed to verify the cid, the card initialisation will abort.
> > it is necessary to print some logs for this critical error.
> 
> Under what circumstances do you observe this error? Is it during
> system resume or during runtime resume?
> 
> Moreover, to get this error, I guess you are remove one card and
> replacing it with another, right?

It happened when the system resume a non-removable eMMC card
from deep suspend.

> Just wanted to be sure of what kind of scenario you are
> testing/looking at. Perhaps you could even fold in some of that
> information in the changelog, that would be nice.

During the eMMC resume stage, CMD2 returned an error CID value
but without any error INT status.
The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000.
The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400.
"c0ff8080" is the response from CMD1.

This problem occasionally occurred when system resume from suspend status.
Maybe it happened because there is something wrong about bus clock or our host
when system do resume. It needs further debugging.

> 
> >
> > Signed-off-by: hongjiefang <hongjiefang@asrmicro.com>
> > ---
> >  drivers/mmc/core/mmc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
> > index da892a5..339015c 100644
> > --- a/drivers/mmc/core/mmc.c
> > +++ b/drivers/mmc/core/mmc.c
> > @@ -1595,6 +1595,8 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr,
> >         if (oldcard) {
> >                 if (memcmp(cid, oldcard->raw_cid, sizeof(cid)) != 0) {
> >                         err = -ENOENT;
> > +                       pr_err("%s: Failed to verify the cid, error %d\n",
> > +                               mmc_hostname(host), err);
> >                         goto err;
> >                 }
> >
> > --
> > 1.9.1
> >
> 
> Kind regards
> Uffe


B&R
Hongjie

Ulf Hansson Feb. 27, 2019, 2:25 p.m. UTC | #3

On Wed, 27 Feb 2019 at 15:11, Fang Hongjie(方洪杰)
<hongjiefang@asrmicro.com> wrote:
>
> > On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com> wrote:
> > >
> > > when failed to verify the cid, the card initialisation will abort.
> > > it is necessary to print some logs for this critical error.
> >
> > Under what circumstances do you observe this error? Is it during
> > system resume or during runtime resume?
> >
> > Moreover, to get this error, I guess you are remove one card and
> > replacing it with another, right?
>
> It happened when the system resume a non-removable eMMC card
> from deep suspend.
>
> > Just wanted to be sure of what kind of scenario you are
> > testing/looking at. Perhaps you could even fold in some of that
> > information in the changelog, that would be nice.
>
> During the eMMC resume stage, CMD2 returned an error CID value
> but without any error INT status.
> The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000.
> The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400.
> "c0ff8080" is the response from CMD1.

I see, that is indeed a weird error case. There is for sure a more
serious bug somewhere to look for.

I guess a overwriting the memory containing the response data,
somewhere from the host driver up until here is the is what one should
be looking for in the at first place.

Of course, another option is that the eMMC is internally behaving
in-correct, but then you need to have the debug equipment to be able
to listen and decode the bits transferred on the physical lines.

>
> This problem occasionally occurred when system resume from suspend status.
> Maybe it happened because there is something wrong about bus clock or our host
> when system do resume. It needs further debugging.

I see, but in such case, there should be an error printed in
mmc_bus_resume(), along the lines of "...card was removed?"...

Isn't that sufficient?

[...]

Kind regards
Uffe

Hongjie Fang Feb. 27, 2019, 2:49 p.m. UTC | #4

> > > On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com>
> wrote:
> > > >
> > > > when failed to verify the cid, the card initialisation will abort.
> > > > it is necessary to print some logs for this critical error.
> > >
> > > Under what circumstances do you observe this error? Is it during
> > > system resume or during runtime resume?
> > >
> > > Moreover, to get this error, I guess you are remove one card and
> > > replacing it with another, right?
> >
> > It happened when the system resume a non-removable eMMC card
> > from deep suspend.
> >
> > > Just wanted to be sure of what kind of scenario you are
> > > testing/looking at. Perhaps you could even fold in some of that
> > > information in the changelog, that would be nice.
> >
> > During the eMMC resume stage, CMD2 returned an error CID value
> > but without any error INT status.
> > The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000.
> > The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400.
> > "c0ff8080" is the response from CMD1.
> 
> I see, that is indeed a weird error case. There is for sure a more
> serious bug somewhere to look for.
> 
> I guess a overwriting the memory containing the response data,
> somewhere from the host driver up until here is the is what one should
> be looking for in the at first place.
> 
> Of course, another option is that the eMMC is internally behaving
> in-correct, but then you need to have the debug equipment to be able
> to listen and decode the bits transferred on the physical lines.
> 
> >
> > This problem occasionally occurred when system resume from suspend status.
> > Maybe it happened because there is something wrong about bus clock or our
> host
> > when system do resume. It needs further debugging.
> 
> I see, but in such case, there should be an error printed in
> mmc_bus_resume(), along the lines of "...card was removed?"...
> 
> Isn't that sufficient?

There is an error log " mmc0: error -2 doing runtime resume".
But there's no clear indication of where went wrong.
This patch wants to give a clear hint. It should be better.

> 
> [...]
> 
> Kind regards
> Uffe


B&R
Hongjie

Ulf Hansson Feb. 27, 2019, 5:33 p.m. UTC | #5

On Wed, 27 Feb 2019 at 15:49, Fang Hongjie(方洪杰)
<hongjiefang@asrmicro.com> wrote:
>
>
> > > > On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com>
> > wrote:
> > > > >
> > > > > when failed to verify the cid, the card initialisation will abort.
> > > > > it is necessary to print some logs for this critical error.
> > > >
> > > > Under what circumstances do you observe this error? Is it during
> > > > system resume or during runtime resume?
> > > >
> > > > Moreover, to get this error, I guess you are remove one card and
> > > > replacing it with another, right?
> > >
> > > It happened when the system resume a non-removable eMMC card
> > > from deep suspend.
> > >
> > > > Just wanted to be sure of what kind of scenario you are
> > > > testing/looking at. Perhaps you could even fold in some of that
> > > > information in the changelog, that would be nice.
> > >
> > > During the eMMC resume stage, CMD2 returned an error CID value
> > > but without any error INT status.
> > > The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000.
> > > The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400.
> > > "c0ff8080" is the response from CMD1.
> >
> > I see, that is indeed a weird error case. There is for sure a more
> > serious bug somewhere to look for.
> >
> > I guess a overwriting the memory containing the response data,
> > somewhere from the host driver up until here is the is what one should
> > be looking for in the at first place.
> >
> > Of course, another option is that the eMMC is internally behaving
> > in-correct, but then you need to have the debug equipment to be able
> > to listen and decode the bits transferred on the physical lines.
> >
> > >
> > > This problem occasionally occurred when system resume from suspend status.
> > > Maybe it happened because there is something wrong about bus clock or our
> > host
> > > when system do resume. It needs further debugging.
> >
> > I see, but in such case, there should be an error printed in
> > mmc_bus_resume(), along the lines of "...card was removed?"...
> >
> > Isn't that sufficient?
>
> There is an error log " mmc0: error -2 doing runtime resume".
> But there's no clear indication of where went wrong.
> This patch wants to give a clear hint. It should be better.

Well, the point is, that it may not be an error for some cases.

If you remove the card in suspended state and then insert a new one,
then when resume, the check detects that it's not the old card, thus
we removes the old card and re-detect the new once.
Perhaps printing a message on debug level seems more appropriate, does
that work for you?

Kind regards
Uffe

Hongjie Fang Feb. 28, 2019, 5:53 a.m. UTC | #6

> On Wed, 27 Feb 2019 at 15:49, Fang Hongjie(方洪杰)
> <hongjiefang@asrmicro.com> wrote:
> >
> >
> > > > > On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com>
> > > wrote:
> > > > > >
> > > > > > when failed to verify the cid, the card initialisation will abort.
> > > > > > it is necessary to print some logs for this critical error.
> > > > >
> > > > > Under what circumstances do you observe this error? Is it during
> > > > > system resume or during runtime resume?
> > > > >
> > > > > Moreover, to get this error, I guess you are remove one card and
> > > > > replacing it with another, right?
> > > >
> > > > It happened when the system resume a non-removable eMMC card
> > > > from deep suspend.
> > > >
> > > > > Just wanted to be sure of what kind of scenario you are
> > > > > testing/looking at. Perhaps you could even fold in some of that
> > > > > information in the changelog, that would be nice.
> > > >
> > > > During the eMMC resume stage, CMD2 returned an error CID value
> > > > but without any error INT status.
> > > > The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000.
> > > > The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400.
> > > > "c0ff8080" is the response from CMD1.
> > >
> > > I see, that is indeed a weird error case. There is for sure a more
> > > serious bug somewhere to look for.
> > >
> > > I guess a overwriting the memory containing the response data,
> > > somewhere from the host driver up until here is the is what one should
> > > be looking for in the at first place.
> > >
> > > Of course, another option is that the eMMC is internally behaving
> > > in-correct, but then you need to have the debug equipment to be able
> > > to listen and decode the bits transferred on the physical lines.
> > >
> > > >
> > > > This problem occasionally occurred when system resume from suspend
> status.
> > > > Maybe it happened because there is something wrong about bus clock or
> our
> > > host
> > > > when system do resume. It needs further debugging.
> > >
> > > I see, but in such case, there should be an error printed in
> > > mmc_bus_resume(), along the lines of "...card was removed?"...
> > >
> > > Isn't that sufficient?
> >
> > There is an error log " mmc0: error -2 doing runtime resume".
> > But there's no clear indication of where went wrong.
> > This patch wants to give a clear hint. It should be better.
> 
> Well, the point is, that it may not be an error for some cases.
> 
> If you remove the card in suspended state and then insert a new one,
> then when resume, the check detects that it's not the old card, thus
> we removes the old card and re-detect the new once.
> Perhaps printing a message on debug level seems more appropriate, does
> that work for you?

I see.
This patch need to be modified to change the log level.

> 
> Kind regards
> Uffe



B&R
Hongjie

mmc: core: add an error log for cid verification

Commit Message

Comments

Patch