Follow these guidelines to optimize large programs:
Some older programs use standard-violating techniques that KAP will not transform safely. If KAP fails because of this problem, there is little optimization you can do.
If you have the time and you know what the program is supposed to do, you can try to isolate the incorrect code, correct it, and proceed. This action is feasible for large programs only if the problems are easily understood and isolated or if you have enough time to find more intractable problems.
If the problem code is isolated and runs without KAP optimization, you may be able to run KAP on the rest of the program and leave out any problematic sections. You can also refer to Section 2.14 on KAP problems. You may be able to diagnose and correct some problems, and then run KAP on your program successfully.
gprof ) to identify
   the program units that take the most time to run.
   If some time-intensive units have many iterative loops and arrays, then those units are good candidates for KAP loop optimizations. Go to step 4.
If these units are not good candidates, then the lower-payoff optimizations, such as inlining, may provide some performance improvement especially if there are places where inlining inside loop nests may also allow KAP to perform vectorization optimizations. In this case, go to step 6.
   /optimize=2 ), compile the whole program with the other
   qualifiers used in the best run from step 2, note the execution
   time, and verify the results.
   
   If the program fails, try again with the KAP qualifier 
   /roundoff=0 . If that works, the failure is probably due
   to roundoff-sensitive operation. If it still fails with 
   /roundoff=0 , try /scalaropt=1 .
   
/roundoff=0 or
   /scalaropt=1 , if needed.
   If the program fails, reduce the setting to a lower KAP optimization level or a lower compiler optimization level, and try again. If you have success at this step, you can also try the suggestions found in Section 2.13.
/optimize=0 and /inline_
   and_copy =aaa,bbb,ccc,.., where aaa, bbb, and so forth,
   are the most frequently called routines from the profiling run in
   step 3.
   
   If this action succeeds, repeat with the /optimize=4
   and
   /inline_and_copy=... qualifiers. If this
   action fails, try rerunning with /roundoff=0 or
   /scalaropt=1  or with fewer routines inlined. (See
   Section 2.14 for an explanation of binary
   chop.) Also, if you have success at this step, try the
   suggestions in Section 2.13.
   
