TM
Intersil 80C286 Performance Advantages
Over the 80386
Application Note
March 1997
AN111.1
Introduction
The Intersil 80C286, operating at the same frequency as the
80386, has performance advantages over the 80386 when
executing 16-bit industry standard 80C86 or 80C286 code.
This is evident in the following areas:
(1) Input/Output Handling
(2) Interrupt Handling
(3) Control Transfer (Loop, Jump, Call)
(4) 286 Protected Mode Systems
(5) Multi-Tasking and Task Switching Operations.
This advantage is due to the 80C286 requirement of fewer
clock cycles to execute the same instructions. In addition to
these areas, the 80C286 executes many other instructions in
the same number of clock cycles as the 80386.
This results in an 80C286 performance advantage in areas
including:
• Multi-Tasking Systems.
• Control Applications
instructions.
-
utilizing
interrupt
and
I/O
Architecture Background
The 80C286 Intersil’s newest static CMOS microprocessor
combines low operating and standby power with high
performance. The Intersil 80C286 is available in speeds of
12.5MHz, 16MHz, 20MHz and 25MHz.
The 80C286 evolved from the industry standard 80C86
microprocessor. The 80C286 has vast architectural
enhancements over its predecessor that allow the 80C286 to
execute the same code with a significant performance
increase. Disregarding the clock speed increase, when
upgrading from an 80C86 to an 80C286, the 80C286 can
execute the same code with an increase in throughput of up
to 4 times that of the 80C86. This increase is solely due to
the architectural enhancements.
It is common belief that replacing an 80C286 with the 32-bit
80386 microprocessor will yield similar performance
increases. This is not the case. The new architecture gives
the 80386 32-bit capability and increased protection
features, but it does not significantly increase the throughput
of a 16-bit 8086 or 80286 code. In most cases, when
executing industry standard 8086 or 80286 code, replacing
the 80C286 with an 80386 does not result in a significant
performance increase. In some cases, such a replacement
will actually cause a performance degradation.
Figure 1 illustrates a comparison of the number of clock
cycles needed to execute several instructions available on
all three microprocessors (80C86, 80C286, and 80386). This
illustrates the dramatic effect of 80C286 architectural
enhancements on performance when compared to the
80C86 and the lack of similar performance improvement
when executing 8086/80286 code on the 80386.
With an 80C86 to 80C286 upgrade, system designers can
execute existing 8086 code on the 80C286 and take
advantage of an immediate performance upgrade. This
same benefit is not realized when switching from an 80C286
to an 80386. This comparison illustrates that changing from
an 80C286 to an 80386 does not yield throughput when
executing the same industry standard 80C86/80C286 code
(the world’s largest base of microprocessor software).
• Structured Software - utilizing many Control transfer
instructions.
• Operating Systems that rely on interrupts to perform
functions.
• Upgrading 16-bit 80C86 applications for increased
performance.
The 80C286 can be effectively used as a fast 80C86.
However, the 80386 is not a fast 80C286. This study shows
that software written for the 80C86/80C286 can execute
more efficiently on the 80C286 than on the 80386. There is
not significant performance advantage to be gained by
simply moving a system design from an 80C286 to an 80386
at either 16MHz or 20MHz. The 80C286 is the processor
best suited for executing 16-bit 80C86/80C286 code, which
represents the world’s largest base of microprocessor
software.
1
1-888-INTERSIL or 321-724-7143
|
Intersil (and design) is a trademark of Intersil Americas Inc.
Copyright © Intersil Americas Inc. 2001. All Rights Reserved
Application Note 111
TOTAL (353)
MUL (BX)
XOR AX, (BP) (SI)
OUT PORT, AX
NOT (BX+ 10)
CALL near
LOOP
INT 3
AAD
ADD mem (BX) (DI), AX
NUMBER OF CLOCKS TO EXECUTE INSTRUCTIONS
(SPECIFIED VALUES IN PARENTHESES)
(125)
(17)
(10)
(25)
(19)
(17)
TOTAL (119)
(52)
TOTAL (101)
(24)
(7) (3)
(7)
(8) (7)
(23)
(28)
80C86
80C286
(14)
(8)
80386
(19)
(7)
(19)
(7)
(10) (6)
(7)
(11)
(33)
(60)
FIGURE 1. ARCHITECTURAL COMPARISON
Instruction Comparison
The Appendix in this document illustrates a direct compari-
son of the number of clock cycles needed to execute the
same instructions on the 80C286 and the 80386. The table
includes examples of instruction timing for all instructions
available on both processors. Several addressing modes of
each instruction type are included.
Of the 190 instruction examples analyzed, 74 of the
instructions execute faster on the 80C286 than on the
80386; 66 of the instructions analyzed execute in the same
number of clock cycles on both processors. This leaves only
50 instructions with improved performance on the 80386
(See Figure 2). Over 70% of the instructions analyzed
execute as fast or faster on the 80C286 than on the 80386.
This is vastly different than the previous 86-286 upgrade.
With that upgrade, the 80C286 exhibits equal or better per-
formance than the 80C86 with 100% of the instructions. This
clearly indicates the 80C286 is the processor best suited for
executing industry standard 8086 family code.
The following discussion groups each of the instructions into
one of several categories to analyze which applications will
benefit from utilizing the 80C286. The categories used are:
• Jumps, Calls, Returns and Loops (Real Mode)
• I/O Instructions
• Logic, Arithmetic, Data Transfer, Shift and Rotate
Instructions
• Interrupts
• Miscellaneous Instructions
• Protected Mode/Multi-Tasking Instructions
80C286
FASTER THAN
80386
74
Jumps, Calls, and Loops
In real mode, near calls, jumps, and conditional jumps
(transfers within the current code segment) all take the same
number of clock cycles to execute on the 80C286 and the
80386. Since the segment sizes are larger on the 80386, the
near transfer instructions on the 80386 can transfer a
greater distance.
80C286
EQUAL TO
80386
66
50
80386
FASTER THAN
80C286
FIGURE 2. EXECUTION SPEED COMPARISON (NUMBER OF
INSTRUCTIONS)
The far calls and jumps (transfers that switch to a new code
segment; i.e., a code segment context switch) are faster on
the 80C286: four clocks and one clock respectively. The far
return instruction executes in three less clock cycles on the
80C286, and the near return takes one extra clock cycle.
The protected mode calls, jumps, and returns are all faster
2
Application Note 111
on the 80C286 and are discussed in the section on Pro-
tected Mode.
The loop instruction is three clock cycles faster on the
80C286 than the 80386. Thus, the 80C286 would save 300
clock cycles over the 386 if a LOOP instruction were
executed 100 times.
ADVANTAGE
INSTRUCTION
Near JMP and Call
Far CALL, JMP and RET
LOOP
X
X
80C286
NONE
X
80386
Most of the string manipulation instructions execute in the
same number of clock cycles on both processors. The
MOVS and STOS instructions are faster on the 80C286.
The divide instruction executes in the same number of clock
cycles on both processors. The number of clocks to execute
the multiply instruction on the 80386 is data dependent; the
number of clocks to execute the same instruction on the
80C286 is fixed. On average, the multiply instruction is five
clock cycles faster on the 80386, but depending on the data,
the 80386 could be as many as 4 clock cycles slower the
80C286.
The rotate and shift instructions are faster on the 80386.
Unlike the 80C286, the 80386 rotate and shift instructions do
not depend on the number of bits to be shifted or rotated.
Thus, the 80386 has the advantage with multi-bit rotate and
shift instructions. The 80C286 does, however, execute single
bit rotate and shift instructions faster.
ADVANTAGE
INSTRUCTION
Most Logic and Arithmetic
Certain Operand Combinations
of Logic and Arithmetic
Divide
Multiply
Single Bit Shift or Rotate
Multi-Bit Shift or Rotate
String Instructions
X
X
X
X
X
80C286
NONE
X
X
80386
I/O Instructions
The 80C286 has significant advantage with the I/O
instructions. The IN instruction is almost 2 1/2 times faster
on the 80C286; the 80386 takes 7 extra clock cycles to exe-
cute the same instruction. The OUT instruction is over 3
times faster on the 80C286; again the 80386 takes 7 extra
clock cycles to execute the same instruction. Executing the
I/O instructions on the 80386 is equivalent to executing on
the 80C286 with 7 wait states.
The string I/O instructions (INS and OUTS) are also signifi-
cantly faster on the 80C286. The INS instruction is 10 clock
cycles faster on the 80C286, and the OUTS instruction is 9
clock cycles faster. This is particularly important if the string
operations are going to be used to input and output a large
block of data using the REP prefix. Inputing 100 words of
data with the REP INS instruction is 208 clock cycles faster
on the 80C286. An even more significant difference can be
seen when outputing 100 words with the REP OUTS instruc-
tion. In this case, the 80C286 is 800 clock cycles faster than
the 80386.
ADVANTAGE
INSTRUCTION
IN
OUT
INS
OUTS
80C286
X
X
X
X
NONE
80386
Interrupt Instructions
Interrupts are serviced more quickly on the 80C286. The INT
instruction, in real mode, executes 14 cycles faster on the
80C286 than it does on the 80386. The INTO, BOUND, and
other instructions that can cause an interrupt all benefit from
the faster interrupt handling features of the 80C286. The
return from interrupt instruction (IRET) is 7 clock cycles
faster on the 80C286. The PUSHA and POPA instructions,
frequently used by interrupt handling procedures, are both
faster on the 80C286. Protected Mode interrupt handling is
discussed in the Protected Mode section.
ADVANTAGE
Logic, Arithmetic, Data Transfer, Shift and
Rotate Instructions
Most forms of logic, arithmetic, and data transfer instructions
execute in the same number of clock cycles on both
processors. Certain operand combinations of these instruc-
tions (immediate to register for example) take one extra
clock to execute on the 80C286.
In real mode, the segment register transfer instructions exe-
cute as fast or faster on the 80C286 than they do on the
80386. For example, using the POP instruction to transfer
data into a segment register is 2 clock cycles faster on the
80C286.
3
INSTRUCTION
INTN
INTO
BOUND (If Interrupt)
Break Point Interrupt
80C286
X
X
X
X
NONE
80386
Miscellaneous Instructions
The BCD instructions, HLT, and CBW execute from 1 to 5
clock cycles faster on the 80C286. The instructions to set
and clear individual flags and the CWD instruction all
Application Note 111
execute in the same number of cycles on both processors.
The ENTER, LEAVE, and BOUND instructions are from 1 to
3 cycles faster on the 80386. The BOUND instruction is only
faster if an interrupt is not caused by the instruction.
ADVANTAGE
INSTRUCTION
BCD Instructions
Data Conversion (CBW, CWD)
Flag Settling and Clearing
BOUND (If No Interrupt)
80C286
X
X
X
X
INSTRUCTION
Task Switching
Segment Register Loading
ADVANTAGE
80C286
X
X
X
X
X
X
NONE
80386
NONE
80386
Most of the 80286 protected mode access checking instruc-
tions operate as fast or faster on the 80C286 than on the
80386. The LAR instruction is one clock cycle faster on the
80C286 and the LSL instruction is 5 clock cycles faster. The
VERW instruction executes in the same speed on both pro-
cessors and the VERR is 5 cycles faster on the 80386. The
ARPL instruction used in protected mode procedures for
pointer validation is 10 clock cycles faster on the 80C286.
Protected Mode/Multi-Tasking
When executing 80286 protected mode code, the 80C286
significantly out-performs the 80386. Task switching
operations execute 100 to 113 clock cycles faster on the
80C286. The instruction to return from a called task is 63
clock cycles faster on the 80C286. This results in a very
significant performance increase for systems utilizing the
multi-tasking features.
Inter-segment JMP, CALL and segment loading instructions
also operate faster on the 80C286. The 80C286 saves any-
where from 4 to 11 clock cycles depending on the particular
inter-segment transfer instruction. In protected mode, the
inter-segment return is also faster on the 80C286. The
80C286 is 7 clock cycles faster when executing an inter-seg-
ment return to the same privilege level and is 13 cycles
faster on inter-segment returns to a different privilege level.
The instructions to initialize and check the protected mode
registers execute as fast or faster on the 80C286. The IDTR
access instructions are an exception to this in that they take
one extra clock cycle to execute on the 80C286. The
instruction to switch the processor to protected mode
(LMSW) is 7 cycles faster on the 80C286.
Inter-Segment Transfer
System Register Instructions
Inter-Segment Transfers
Access Checking Instructions
Subroutine Analysis
This section lists several subroutines and then compares the
number of clock cycles each subroutine will take to execute
on the 80C286 and on the 80386.
Example 1
This interrupt routine outputs a character to a terminal via a
UART. The AL register must contain the character to be
output. The routine first checks the status of the UART to
determine if it is busy. If it is busy, the routine loops until the
UART is free; when the UART is free, the character is out-
put. Following is a listing of the code and the clock cycle
analysis for the OUT_CHAR routine.
This sample procedure executes about 25% faster on the
80C286 than on the 386. The advantage is realized through
the 80C286’s faster interrupt handling and faster I/O
instructions.
80C286
CLOCK CYCLES
3
3
5
6
3/7
5
3
5
17
23
____
73
18
80386
CLOCK CYCLES
4
2
12
5
3/7
4
10
5
22
37
____
104
24
CK_STATUS:
PUSHF
OUT_CHARACTER PROC NEAR
; save caller’s flags.
; save data to be output.
; Input UART status.
; Check If UART Busy.
; If busy go check again.
; If not busy restore AX
; and output data.
; Restore Flags
; Return.
; Instruction to initiate OUTCHAR
; Interrupt.
Total cycles if UART not busy.
Number of cycles added for each loop while UART is busy.
EXAMPLE 1.
PUSH AX
IN AL, PORT_STATUS
CMP AL, BUSY
JE CK_STATUS
POP AX
OUT OUT_PORT, AL
POPF
IRET
INT x
4
Application Note 111
Example 2
The second example outputs and entire string of characters
using the previous interrupt routine (denoted by “INT x” in
the code below). The DS:SI registers point to the beginning
of the string to be output. The string is variable in length and
must be terminated with the “$” character.
To output a string of 20 characters, the 80C286 would take
1,899 clock cycles; using the same routine, the 80386 would
take 2,511 cycles. Each time a string of 20 characters is
output, the 80C286 will save 612 clock cycles; an 80C286
performance increase of almost 25%. The advantage is
realized through the 80C286’s faster interrupt handling,
faster I/O instructions, faster FAR transfer instructions and
faster register saving and restoring instructions.
OUT_STRING PROC FAR
PUSHA
NEXT:
LODSB
CMP AL, “$”
JE done
INT x
JMP next
DONE:
POPA
RET
Call OUT_STRING
; save caller’s registers.
; Load first char to be output.
; Check to see if End of string.
; If end then go to DONE.
; If not end output character.
; Go get next char to output.
; Restore Registers when done.
; Far Return.
; Far Call to initiate.
; OUT_STRING procedure.
Total number of clocks to start and end routine.
+Number of additional clocks to output each character in the output string.
EXAMPLE 2.
80C286
CLOCK CYCLES
17
5
3
3/7
73
7
19
15
13
____
79+91/char
80386
CLOCK CYCLES
18
5
2
3/7
104
7
24
18
17
____
91+121/char
Example 3
This example adds all the values of a source array in
memory to the values of a destination array in memory. The
result is stored in the destination array. Both arrays are
assumed to be in the current data segment. The count (num-
80C286
CLOCK CYCLES
17
2
5
80386
CLOCK CYCLES
18
2
4
PUSHA
ber of words in the array), offset of source array, and offset of
destination array are all assumed to be placed on the stack
(in that order) by the calling program. The source code for
the procedure is listed in the Example 3 Table below.
ADD_ARRAY PROC NEAR
; save caller’s registers.
; Point BP to current stack
; Load array size from stack
; into CX.
MOV BP, SP
MOV CX, (bp+22)
5
4
MOV SI, (bp+20)
; Load offset of source array
; from stack into SI.
5
4
MOV DI, (bp+18)
; Load offset of destination
; array from stack into DI.
2
5
7
3
8/4
2
5
7
2
11
NEXT:
CLD
LODSW
ADD (DI), AX
ADD DI, 02
LOOP NEXT
; Clear Direction Flag.
; Load the source word into AX.
; Add source to destination.
; Point DI to next data.
; Continue to ADD all elements
; in the two arrays.
19
11
24
10
POPA
RET 6
; Restore Registers
; Near return.
; Following is the code necessary to set up and call the above procedure.
5
3
5
2
PUSH count
PUSH offset S_ARRAY
EXAMPLE 3.
; Put count parameter on stack
; Put offset of source array
5