Quantcast
Channel: SCN : Blog List - ABAP Development
Viewing all articles
Browse latest Browse all 943

My Tips about how to handle complex and tricky issues

$
0
0

Symptoms of complex & tricky issues

 

During my seven years working on SAP China, I have resolved hundreds of internal tickets or customer tickets. Among them there are some kinds of tickets which make me headache:

 

 

1. The issue needs complex steps to reproduce

  For example I have ever resolved one customer ticket, I need to (1) create a new sales order (2) create a new customer demand based on sales order (3) create a pick list (3) release the generated delivery note. The issue can only be reproduced by releasing the delivery note. Then I have to repeat the lengthy steps (1) ~ (3) and do debugging in note release.

 

2. Different software components involved

  I bet most of you guys have such feeling: if the issue is purely occuring within your responsibe component, you will always be confident that it could be resolved sooner or later, since you are the owner of your API and quite familar with it. However if your API is called by other software component or from other system with complex context, you have to spend more time to have a basic understanding of the whole story, to find how your API is called, to analyze whether your original design of API could sustain this new challenge you never think about before?

 

3. The issue could only be reproduced in customer production system

  In most of the cases I ever meet, the reason is because of the data setup. For example in customer test system, the test data is not well set up so that the errorous code has no chance to be executed in test system. Sometimes there is technical limitation or whatever other reasons so that it is impossible for you to ask customers to setup exactly the same data in test system as the data they are using in production system. The worst situation is, sometimes the issue occurs during write operation, for example the pricing calculation is wrong when a business document is saved. In this case you can not simply debug the save process, as it will influence customer business. You have to coordinate with customer how to proceed.

 

4. The issue could only be reproduced in background job execution mode but not in online mode

  The first step to check such issue is trying to find whether there are some FMs or methods which should not be used during background execution when the presentation server is not attached.

 

5. The issue could only be reproduced in normal execution, but when you debug the program, everything works perfectly

  Everyday I use debugger to fight against bug. When I found the bug could not be found via debugging, however it does exist in fact, I feel helpless, since this powerful weapean could not help me out this time. Then I have to read and analyze the code, and make them running in my brain. In most cases finally the issue is related to time-dependent processing in the program.

 

  As an ABAPer we are lucky since we do not always have to struggle with such time-dependent issues. When I am developing an Android application for SAP CRM customer briefing in year 2012, I suffer a lot from such kinds of issues. Just two examples:

 

  a. When you touch the Android tablet with single finger and make a slip, there are 5 or 6 different kinds of events triggered sequentially. My event handler registered for these events will handle with the coordinates of events occurred. Those coordinates will become invalid if code stopped in debugger. Then I have to write many System.out.println to print the coordinate in console for analysis.

 

  b. Dead lock in multi-threading. Such issue is hard to reproduce via debugging.

 

In fact some issue does not simply fall into one or two categories listed above but consists of several of them. I never encounter an issue from customer which contains all the five feature above, and I pray I will NEVER meet with it.

 

An example of how to resolve such kind of issue

 

Recently I have been working on one ticket which took me totally almost 10 hours to resolve it. I will share how I analyze this issue step by step.

 

I am owner of SAP CRM IBASE component CRM-MD-INB, the issue is my Solution management development team colleague complains when they create a new IBASE component and delete it afterwards in the same session and do a save operation, there will be ST22 dump in middleware processing stuff.

 

clipboard1.png

I know nothing about solution management development before.

This issue could only be reproduced in background execution. ( The program is designed to only execute in background )

The issue is not always reproducible. 囧

 

clipboard2.png

Step1 Understand how and when my API is called

 

I quickly go through the solution manager program, there are tens of thousands code. I set breakpoint inside my API ( IBASE create, update and delete function module ), then identify all calling space and importing parameter content.

 

 

Step2 Write simulation report and ensure the issue could be reproduced via it

 

As the scenario is really complex - CRM, SOL and Middleware involved, I spent one hour debugging without any hint found. Purely judgement based on code level, there are too many factors which will impact the program. In order to make me concentrate on my API, I plan to develop a simulation report which also calls IBASE create, update and delete and then perform save. The idea is to make the API call decouple from the solution manager logic. If the issue could then also be reproduced in my simulation report, then life is eaiser - I then only have to work on the simulation report which only contains 200 lines.

 

I have spent another 1 hour to finish the simulation report. Unfortunately I cannot reproduce the issue with it. After I check again with issue reporter,

I realized that the report does not 100% simulate the real program regarding IBASE operation, and I change it to fix the gap.

The simulation report is uploaded as attachment of this blog.

 

clipboard3.png

 

Since the simulation report is owned by me, it is very convenient to change it for issue analysis.

 

a. comment out all IBASE related code.

b. uncomment IBASE component creation FM, and execute report - no dump

c. continue to uncomment IBASE component change FM, and execute - no dump

d. continue to uncomment IBASE component deletion FM, and execute - dumps!!!

 

So now I get to know this issue is related to IBASE deletion.

 

Step4 Investigation on ST22 dump

 

Now the issue could be reproduced during normal execution of simulation report, but could work perfectly well during debugging.

My previois experience told me that it might be caused by some time dependent processing logic in the code. Then I check the position of code which raises error( line 103 ) and found lots of time operation logic in the same include:

clipboard4.png

 

The aim of this include is to find the IBASE and filled it into es_ibinadm. First check in buffer table gt_ibinadm_by_in_guid, if failed then try FM in line 91( in the first screenshot of this blog) as last defence. In normal case, the es_ibinadm is expected to be filled however in this issue, the last defence also fails so the X message is raised.  I set breakpoint in this include, however during my debugging, the variable es_ibinadm is successfully filled in line 54, everything works pefectly. However the dump is indeed there when I execute the report directly.

 

So I run the report once again and go to ST22, this means the dump there is "fresh" and the Debugger button is available only in this fresh state, so that I can observe the variable value in debugger when the dump occurs.

 

clipboard5.png

I soon find the root cause: the valfr and valto of the buffer entry is the same,


clipboard6.png

so during normal execution, the check in line 53 fails, so the code has to try the last defence to call FM CRM_IBASE_COMP_GET_DETAIL. In this case, it is expected behavior to raise an X message since the entry in the buffer table should be returned. When the code is executed in debugger, the valto is always greater than valfr, so the code directly return the entry to its caller without further call on FM CRM_IBASE_COMP_GET_DETAIL.


clipboard7.png

I will not go deep into IBASE valfr and valto generation logic as it is CRM specific and I am also not expert on it. ( a default creation of IBASE component creation will set its valid to timestamp as a never invalid date( 99991231235959 ). The comparison timestamp is set as valid from timestamp )


clipboard8.png

After I add the following code to ensure the check in line 53 in above screenshot will always succeed, the issue is resolved - no dump in background job execution any more.

clipboard9.png


I guess it would also work if the "<" is changed to "<=" in line53. However this code is owned by Middleware software component and I could not change, maybe I can discuss with responsible colleague.


clipboard10.png


Summary

 

1. Benefit of simulation report

 

Although it took me 1 hour to develop the simulation report, I think it is definitely worth since it liberates me from spending lots of time and effort to debug the unfamiliar solution management program and enable me to concentrate on the core code which might be related to the dump.

Sometimes if you have some findings and need to make changes on the code which calls your API for verification, you can not really do this since the code is not owned by you. In this case the simulation report plays its role! You can change it at your will for verification.

 

2. The Mini-System methodology for issue-isolation

 

In early ten years of 21 Century, it is very popular in China to assemble a PC by ourselves via DIY approach. It means we buy CPU, memory chip, hard disk, motherboard and other stuffs from different hardware manufacturers and assemble them. Most common issue is after assembly, the computer cannot boot at all. Then we use "Mini-System" for trouble shooting: as first step we only try to boot computer with LEAST necessary hardwares ( CPU + Power + Motherboard: these three components constitute a so called "Mini-System" ). If the first attempt succeed, we can append additional component, but ensure only ONE new component is added in EACH step. Such iteration could enable us to find which hardware makes the boot failed.

 

clipboard11.png

Compared with computer system, our ABAP program is much more complex and issue-isolation is then necessary for root cause investigation.

In my issue processing I used "Mini-System" methodology to finally identify that the dump is related to the incorrect call of IBASE delete function module.

 

3. Try to gain a perspective of overall situation of the issue

 

In this issue processing I spent quite a lot of time to debug why function module CRM_IBASE_COMP_GET_DETAIL raises an X message in the beginning.

Inside this FM it calls some deeper APIs which are not owned by me so I waste lots of time to understand the logic. Later after I read the whole source code of includes where the CRM_IBASE_COMP_GET_DETAIL is called, I asked myself: should it be called at all? Why is it called in normal execution to get data from DB, although the entry is already available in the buffer??

 

Do not think solely, think holistically

 

It makes sense to spend time and effort to debug the code where the exception is raised to understand why. It makes MORE sense to investigate the code holistically, analyze the callstack and execution context of the code. If the code ( method or function module ) fails to generate the correct output as you expect, ask yourself:  should it be called at all?

 

Hope this blog can help for your issue analysis. And also welcome to share your tip & experience regarding tough issue processing


Viewing all articles
Browse latest Browse all 943

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>