Is there a script available for post processing some objdump --disassemble
output to annotate with cycle counts? Especially for the ARM family. Most of the time this would only be a pattern match with a table lookup for the count. I guess annotations like +5M
for five memory cycles might be needed. Perl, python, bash, C
, etc are fine. I think this can be done generically, but I am interested in the ARM, which has an orthogonal instruction set. Here is a thread on the 68HC11 doing the same thing. The script would need an CPU model option to select the appropriate cycle counts; I think these counts already exist in the gcc
machine description.
I don't think there is an objdump
switch for this, but RTFM would be great.
Edit: To clarify, assumptions such as best case memory sub-system as will be the case when the code executes from cache are fine. The goal is not a 100% accurate cycle count as per some running machine. It is possible to get a reasonable estimate, otherwise compiler design would be impossible.
As DWelch points out, a simple running total is not possible with deep pipelined architecture, like more recent Cortex chips. The objdump
post processing would have to look at surrounding opcodes. A gcc plug-in is more likely to be able to accomplish this and as that is new (4.5+), I don't think such a thing exists. A script for the ARM926 is certainly possible and fairly simple.
The memory latency doesn't matter. The memory controller is like another CPU
. It is doing it's business while the CPU is doing arithmetic, etc. A good/well tuned algorithm will parallel the memory accesses with the computations. By counting loads/store and cycles you can determine how much parallelism is accomplished, when you actively profile with a timer. The pipeline is significant due to interlocks between registers, but a cycle count for basic blocks can reliably be calculated and used even on modern ARM processors; this is too complex for a simple script.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…