ILP (Instruction Level Parallelism) and DLP (Data Level Parallelism) are two commonly used technique to improve processor performance. On SHARC+ processors, both are available and could be leveraged.
Note: SHARC+ processors are (obviously) pipelined, so “single cycle” here doesn’t necessarily mean 1 cycle for everything (from fetch to retire), it’s the same meaning as in the cycle in IPC.
Many of the processors nowadays supports multiple issuing, means they could issue multiple instructions per cycle. General purpose processors usually do this by checking for the dependencies at runtime to determine how many instructions could be issues, DSPs like SHARC+ usually relies on these information to be determined at compile time. Or in other words, compiler or assembly programmer decides how many and what are the instructions to be issues per cycle.
Take a simple example, if I want to add two numbers together, and multiple two other numbers together, on SHARC+ they could be executed in a single cycle, like below:
r0 = r1 * r4, r2 = r8 * r12;
As you could see, on SHARC+ assembly, instructions to be executed in one same cycle are separated by commas, and instructions to be executed in different cycles are separated by semicolons. For example, the following sequence would be executed in 2 cycles instead of one:
r0 = r1 * r4; r2 = r8 * r12;
Now let’s talk about the restrictions of SHARC+ multiple issuing.
Multiple issuing are limited by the resources available on the processor. For example, it could do one addition and one multiplication in single cycle, because there is 1 adder and 1 multiplier, but not 2 additions as there is only 1 adder.
As we mentioned before, there are 2 buses on SHARC, DM bus and PM bus. This allows one bus access on each bus per cycle. So one could do 2 loads or 2 stores in a single cycle as long as the data resides in different SRAM blocks or cache hits, otherwise there will be stalls.