Message ID | 1550733306-9131-1-git-send-email-hongjiefang@asrmicro.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mmc: core: add an error log for cid verification | expand |
On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com> wrote: > > when failed to verify the cid, the card initialisation will abort. > it is necessary to print some logs for this critical error. Under what circumstances do you observe this error? Is it during system resume or during runtime resume? Moreover, to get this error, I guess you are remove one card and replacing it with another, right? Just wanted to be sure of what kind of scenario you are testing/looking at. Perhaps you could even fold in some of that information in the changelog, that would be nice. > > Signed-off-by: hongjiefang <hongjiefang@asrmicro.com> > --- > drivers/mmc/core/mmc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c > index da892a5..339015c 100644 > --- a/drivers/mmc/core/mmc.c > +++ b/drivers/mmc/core/mmc.c > @@ -1595,6 +1595,8 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr, > if (oldcard) { > if (memcmp(cid, oldcard->raw_cid, sizeof(cid)) != 0) { > err = -ENOENT; > + pr_err("%s: Failed to verify the cid, error %d\n", > + mmc_hostname(host), err); > goto err; > } > > -- > 1.9.1 > Kind regards Uffe
> On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com> wrote: > > > > when failed to verify the cid, the card initialisation will abort. > > it is necessary to print some logs for this critical error. > > Under what circumstances do you observe this error? Is it during > system resume or during runtime resume? > > Moreover, to get this error, I guess you are remove one card and > replacing it with another, right? It happened when the system resume a non-removable eMMC card from deep suspend. > Just wanted to be sure of what kind of scenario you are > testing/looking at. Perhaps you could even fold in some of that > information in the changelog, that would be nice. During the eMMC resume stage, CMD2 returned an error CID value but without any error INT status. The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000. The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400. "c0ff8080" is the response from CMD1. This problem occasionally occurred when system resume from suspend status. Maybe it happened because there is something wrong about bus clock or our host when system do resume. It needs further debugging. > > > > > Signed-off-by: hongjiefang <hongjiefang@asrmicro.com> > > --- > > drivers/mmc/core/mmc.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c > > index da892a5..339015c 100644 > > --- a/drivers/mmc/core/mmc.c > > +++ b/drivers/mmc/core/mmc.c > > @@ -1595,6 +1595,8 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr, > > if (oldcard) { > > if (memcmp(cid, oldcard->raw_cid, sizeof(cid)) != 0) { > > err = -ENOENT; > > + pr_err("%s: Failed to verify the cid, error %d\n", > > + mmc_hostname(host), err); > > goto err; > > } > > > > -- > > 1.9.1 > > > > Kind regards > Uffe B&R Hongjie
On Wed, 27 Feb 2019 at 15:11, Fang Hongjie(方洪杰) <hongjiefang@asrmicro.com> wrote: > > > On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com> wrote: > > > > > > when failed to verify the cid, the card initialisation will abort. > > > it is necessary to print some logs for this critical error. > > > > Under what circumstances do you observe this error? Is it during > > system resume or during runtime resume? > > > > Moreover, to get this error, I guess you are remove one card and > > replacing it with another, right? > > It happened when the system resume a non-removable eMMC card > from deep suspend. > > > Just wanted to be sure of what kind of scenario you are > > testing/looking at. Perhaps you could even fold in some of that > > information in the changelog, that would be nice. > > During the eMMC resume stage, CMD2 returned an error CID value > but without any error INT status. > The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000. > The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400. > "c0ff8080" is the response from CMD1. I see, that is indeed a weird error case. There is for sure a more serious bug somewhere to look for. I guess a overwriting the memory containing the response data, somewhere from the host driver up until here is the is what one should be looking for in the at first place. Of course, another option is that the eMMC is internally behaving in-correct, but then you need to have the debug equipment to be able to listen and decode the bits transferred on the physical lines. > > This problem occasionally occurred when system resume from suspend status. > Maybe it happened because there is something wrong about bus clock or our host > when system do resume. It needs further debugging. I see, but in such case, there should be an error printed in mmc_bus_resume(), along the lines of "...card was removed?"... Isn't that sufficient? [...] Kind regards Uffe
> > > On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com> > wrote: > > > > > > > > when failed to verify the cid, the card initialisation will abort. > > > > it is necessary to print some logs for this critical error. > > > > > > Under what circumstances do you observe this error? Is it during > > > system resume or during runtime resume? > > > > > > Moreover, to get this error, I guess you are remove one card and > > > replacing it with another, right? > > > > It happened when the system resume a non-removable eMMC card > > from deep suspend. > > > > > Just wanted to be sure of what kind of scenario you are > > > testing/looking at. Perhaps you could even fold in some of that > > > information in the changelog, that would be nice. > > > > During the eMMC resume stage, CMD2 returned an error CID value > > but without any error INT status. > > The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000. > > The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400. > > "c0ff8080" is the response from CMD1. > > I see, that is indeed a weird error case. There is for sure a more > serious bug somewhere to look for. > > I guess a overwriting the memory containing the response data, > somewhere from the host driver up until here is the is what one should > be looking for in the at first place. > > Of course, another option is that the eMMC is internally behaving > in-correct, but then you need to have the debug equipment to be able > to listen and decode the bits transferred on the physical lines. > > > > > This problem occasionally occurred when system resume from suspend status. > > Maybe it happened because there is something wrong about bus clock or our > host > > when system do resume. It needs further debugging. > > I see, but in such case, there should be an error printed in > mmc_bus_resume(), along the lines of "...card was removed?"... > > Isn't that sufficient? There is an error log " mmc0: error -2 doing runtime resume". But there's no clear indication of where went wrong. This patch wants to give a clear hint. It should be better. > > [...] > > Kind regards > Uffe B&R Hongjie
On Wed, 27 Feb 2019 at 15:49, Fang Hongjie(方洪杰) <hongjiefang@asrmicro.com> wrote: > > > > > > On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com> > > wrote: > > > > > > > > > > when failed to verify the cid, the card initialisation will abort. > > > > > it is necessary to print some logs for this critical error. > > > > > > > > Under what circumstances do you observe this error? Is it during > > > > system resume or during runtime resume? > > > > > > > > Moreover, to get this error, I guess you are remove one card and > > > > replacing it with another, right? > > > > > > It happened when the system resume a non-removable eMMC card > > > from deep suspend. > > > > > > > Just wanted to be sure of what kind of scenario you are > > > > testing/looking at. Perhaps you could even fold in some of that > > > > information in the changelog, that would be nice. > > > > > > During the eMMC resume stage, CMD2 returned an error CID value > > > but without any error INT status. > > > The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000. > > > The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400. > > > "c0ff8080" is the response from CMD1. > > > > I see, that is indeed a weird error case. There is for sure a more > > serious bug somewhere to look for. > > > > I guess a overwriting the memory containing the response data, > > somewhere from the host driver up until here is the is what one should > > be looking for in the at first place. > > > > Of course, another option is that the eMMC is internally behaving > > in-correct, but then you need to have the debug equipment to be able > > to listen and decode the bits transferred on the physical lines. > > > > > > > > This problem occasionally occurred when system resume from suspend status. > > > Maybe it happened because there is something wrong about bus clock or our > > host > > > when system do resume. It needs further debugging. > > > > I see, but in such case, there should be an error printed in > > mmc_bus_resume(), along the lines of "...card was removed?"... > > > > Isn't that sufficient? > > There is an error log " mmc0: error -2 doing runtime resume". > But there's no clear indication of where went wrong. > This patch wants to give a clear hint. It should be better. Well, the point is, that it may not be an error for some cases. If you remove the card in suspended state and then insert a new one, then when resume, the check detects that it's not the old card, thus we removes the old card and re-detect the new once. Perhaps printing a message on debug level seems more appropriate, does that work for you? Kind regards Uffe
> On Wed, 27 Feb 2019 at 15:49, Fang Hongjie(方洪杰) > <hongjiefang@asrmicro.com> wrote: > > > > > > > > > On Thu, 21 Feb 2019 at 08:16, hongjiefang <hongjiefang@asrmicro.com> > > > wrote: > > > > > > > > > > > > when failed to verify the cid, the card initialisation will abort. > > > > > > it is necessary to print some logs for this critical error. > > > > > > > > > > Under what circumstances do you observe this error? Is it during > > > > > system resume or during runtime resume? > > > > > > > > > > Moreover, to get this error, I guess you are remove one card and > > > > > replacing it with another, right? > > > > > > > > It happened when the system resume a non-removable eMMC card > > > > from deep suspend. > > > > > > > > > Just wanted to be sure of what kind of scenario you are > > > > > testing/looking at. Perhaps you could even fold in some of that > > > > > information in the changelog, that would be nice. > > > > > > > > During the eMMC resume stage, CMD2 returned an error CID value > > > > but without any error INT status. > > > > The error CID values are as follow 15010044, 4436384d, 4202c2c0, ff808000. > > > > The correct value should be 15010044, 4436384d, 4202c2d8, dc2f9400. > > > > "c0ff8080" is the response from CMD1. > > > > > > I see, that is indeed a weird error case. There is for sure a more > > > serious bug somewhere to look for. > > > > > > I guess a overwriting the memory containing the response data, > > > somewhere from the host driver up until here is the is what one should > > > be looking for in the at first place. > > > > > > Of course, another option is that the eMMC is internally behaving > > > in-correct, but then you need to have the debug equipment to be able > > > to listen and decode the bits transferred on the physical lines. > > > > > > > > > > > This problem occasionally occurred when system resume from suspend > status. > > > > Maybe it happened because there is something wrong about bus clock or > our > > > host > > > > when system do resume. It needs further debugging. > > > > > > I see, but in such case, there should be an error printed in > > > mmc_bus_resume(), along the lines of "...card was removed?"... > > > > > > Isn't that sufficient? > > > > There is an error log " mmc0: error -2 doing runtime resume". > > But there's no clear indication of where went wrong. > > This patch wants to give a clear hint. It should be better. > > Well, the point is, that it may not be an error for some cases. > > If you remove the card in suspended state and then insert a new one, > then when resume, the check detects that it's not the old card, thus > we removes the old card and re-detect the new once. > Perhaps printing a message on debug level seems more appropriate, does > that work for you? I see. This patch need to be modified to change the log level. > > Kind regards > Uffe B&R Hongjie
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index da892a5..339015c 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1595,6 +1595,8 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr, if (oldcard) { if (memcmp(cid, oldcard->raw_cid, sizeof(cid)) != 0) { err = -ENOENT; + pr_err("%s: Failed to verify the cid, error %d\n", + mmc_hostname(host), err); goto err; }
when failed to verify the cid, the card initialisation will abort. it is necessary to print some logs for this critical error. Signed-off-by: hongjiefang <hongjiefang@asrmicro.com> --- drivers/mmc/core/mmc.c | 2 ++ 1 file changed, 2 insertions(+)