– Based on MIPS 32 RISC architecture with enhancements
– Scalar 5-stage pipeline minimizes branch and load delays
– 66 Million multiply accumulate (MAC) Mul-Add/second
@ 133MHz
– 100 and 133 frequencies
x
MIPS 32 (ISA) instruction set architecture
– MIPS IV compatible conditional move instructions
– MIPS IV superset PREF (prefetch) instruction
– Fast multiplier with atomic multiply-add, multiply-sub
– Count leading zeros/ones instructions
x
Large, efficient on-chip caches
– Separate 8kB Instruction cache and 2kB Data cache
– 2-way set associative
– Write-back and write-through support on a per page basis
– Optional cache locking with “per line” resolution, to facilitate
deterministic response
– Simultaneous instruction and data fetch in each clock cycle,
sustained rate, achieves over 1 GB/sec bandwidth
x
Flexible RC4000 compatible MMU with 32-page TLB on-chip
– Variable page size
– Variable number of locked entries
– No performance penalty for address translation
x
Flexible bus interface allows simple, low-cost designs
– Bus interface runs at a fraction of pipeline rate
– Programmable port-width interface (8-,16-, 32-bit memory and
I/O regions)
– Programmable bus turnaround times (BTA)
– Supports single data or burst transactions
x
Improved real-time support
– Fast interrupt decode
x
Low-power operation
– Active power management: powers down inactive units
– Typical power 700mW @ 133MHz
– Stand-by mode <300mW
x
Enhanced JTAG interface, for low-cost in-circuit emulation
(ICE)
x
MIPS architecture ensures applications software
compatibility throughout the RISController series of
embedded processors
x
Industrial temperature range support
x
3.3V operation (core and I/O)
x
Block Diagram
RISCore32300
TM
Extended MIPS 32
Integer CPU Core
MMU RISCore4000 Compatible
System Control
w/
TLB Coprocessor (CPO)
Enhanced JTAG (ICE Interface)
8kB I-Cache,
2-set, lockable
2kB D-Cache, 2-set,
lockable, write-back/write-through
Clock
Generation
Unit
RISCore32300 Internal Bus Interface
RC32364 Bus Interface Unit
The IDT logo is a registered trademark and ORION, RC4650, RC4640, RV4640, RC4600, RC3081, RC3052, RC3051, RC3041, RISController, and RISCore are trademarks of Integrated Device Technology, Inc.
1 of 21
2000 Integrated Device Technology, Inc.
*Notice: The information in this document is subject to change without notice
June 20, 2000
DSC 4510
79RC32364™
Overview
Device Overview
Targeted to a variety of performance-hungry, cost-sensitive
embedded applications, the RC32364 is a new low-powered, low-cost
member of the Integrated Device Technology, Inc. (IDT) RISController
Series of Embedded Microprocessors.
The RC32364 brings 64-bit performance levels to lower cost
systems. High performance is achieved through the use of advanced
techniques such as large on-chip two-way set-associative caches, a
streamlined high-speed pipeline, high-bandwidth, and facilities such as
early restart for data cache misses. Also, through IDT proprietary
enhancements to the base MIPS architecture, the processor’s perfor-
mance, in particular applications, is further extended.
The RC32364 is the first member of a new processor family that uses
IDT’s proprietary RISCore32300 CPU core. The RISCore32300 core
continues IDT’s tradition of high-performance through high-speed pipe-
lines, high-bandwidth caches, and architectural extensions that serve
the needs of specific markets; yet the RC32364 provides these capabili-
ties in a low-cost, high-speed 32-bit enhanced MIPS architecture core,
enabling a new level of price performance.
Around the RISCore32300, the RC32364 integrates a fully RC5000
compatible memory management unit (MMU), substantial amounts of
efficient cache memory, an enhanced debug capability, digital signal
processing (DSP) extensions, and a low-cost system interface. The
resulting device is well suited to the needs of mid-range communications
equipment, xDSL equipment, and consumer devices.
Also, being upwardly software compatible with the RC3000 family,
the RC32364 will serve in many of the same applications as well as
support applications that require integer DSP functions.
x
x
x
x
MIPS IV prefetch operations, with various innovative hint
subfields
MIPS IV compatible conditional move instructions
MAD, MUL and MSUB instructions added to the integer multiply
units
Two new instructions: Count Leading Ones (CLO) and Counts
Leading Zeros (CLZ)
These integer unit enhancements combine to make the CPU well
suited to applications that require high bandwidth, rapid computation,
and/or DSP capability.
The RISCore32300 register file
has 32 general-purpose 32-bit
registers that are used for scalar integer operations and address calcu-
lation. The register file consists of two read ports and two write ports
and is fully bypassed to minimize operation latency in the pipeline.
The RISCore32300 arithmetic logic unit
(ALU) consists of the
integer adder and logic unit. The adder performs address calculations in
addition to arithmetic operations; the logic unit performs all of the logic
and shift operations. Each unit is highly optimized and can perform an
operation in a single pipeline cycle.
The RC32364 uses a dedicated
integer multiply/divide unit,
opti-
mized for high-speed multiply and multiply-accumulate operations.
Table 1 lists the repeat rate (peak issue rate of cycles until the operation
can be reissued), latency (number of cycles until a result is available),
and number of processor stalls (number of cycles that the CPU will
always delay the pipeline) required for these operations. Each rate listed
is expressed in terms of pipeline clocks.
Opcode
MULT/U,
MAD/U
MSUB/U
MUL
Operand
Size
16 bit
32 bit
16 bit
32 bit
DIV, DIVU
any
Latency
3
4
3
4
36
Repeat
2
3
2
3
36
Stall
0
0
1
2
0
Device Performance
RC32364 is rated at 175 dhrystone MIPS at 133MHz. The internal
cache bandwidth is over 1.2 GB/sec, with external bus bandwidth of
260MB/sec. Computational performance is further enhanced by the
device’s DSP capability, which supports 66 Million multiply-accummulate
(MAC) operations per second at 133MHz.
The RISCore32300
uses a 5-stage pipeline, similar to the
RISCore3000 and the RISCore4000 processor families. The simplicity
of the pipeline enables the processor to achieve high frequency while
minimizing device complexity, reducing both cost and power consump-
tion. Because this pipeline is not sensitive to the data conflicts that slow-
down super-scalar machines, an added benefit to this pipeline approach
is that sustained actual performance is much closer to the theoretical
maximum performance.
The
RISCore32300 integer execution unit
implements the MIPS 32
ISA. The RISCore32300 thus implements a load/store architecture with
single-cycle ALU operations (logical, shift, add, subtract) and an autono-
mous multiply/divide unit. The 32-bit register resources include 32
general-purpose orthogonal integer registers, the HI/LO result register
for the integer multiply/divide unit, and the program counter.
RISCore32300 CPU core features include:
Table 1 RISCore32300 Integer Multiply/Divide Unit Operation Frequency
The original MIPS architecture defines that the results of a multiply
or divide operation are placed in the HI and LO registers. Using the
move-from-HI (MFHI) and move-from-LO (MFLO) instructions, these
values can then be transferred to the general purpose register file.
As an enhancement to the original MIPS ISA, the RC32364 imple-
ments an additional multiply instruction, MUL, which specifies that
multiply results bypass the LO register and be placed immediately into
the primary register file. By avoiding the explicit MFLO instruction,
required when using LO, and by supporting multiple destination regis-
ters, the throughput of multiply-intensive operations is increased.
2 of 21
*Notice: The information in this document is subject to change without notice
June 20, 2000
79RC32364™
Two atomic operations—multiply-add (MAD) and multiply-subtract
(MSUB)—are used to perform the multiply-accumulate and multiply-
subtract operations. The MAD instruction multiplies two numbers and
then adds the product to the current contents of the HI and LO registers.
Similarly, the MSUB instruction multiplies two operands and then
subtracts the product from the HI and LO registers.
The MAD and MSUB operations are used in numerous DSP algo-
rithms and allow the RC32364 to cost reduce systems requiring a mix of
DSP and control functions.
Finally, for these operations, aggressive implementation techniques
feature low latency along with pipelining to allow the issuance of new
operations before a previous operation has been completed. The
RC32364 also performs automatic operand size detection and imple-
ments hardware interlocks to prevent overrun, achieving high-perfor-
mance with simple programming.
Operation Modes
The RC32364 supports two modes of operation: user mode and
kernel mode. User mode is most often used for applications programs,
and the kernel mode is typically used for handling exceptions and oper-
ating system kernel functions, including CP0 management and I/O
device access.
The processor enters kernel mode at reset and when an exception is
recognized. While in kernel mode, software has access to the entire
address space as well as all of the CP0 registers. User mode accesses
are limited to a subset of the virtual address space and can be inhibited
from accessing CP0 functions.
Virtual-to-Physical Address Mapping
The RC32364’s 4GB virtual address space is divided into addresses
that are accessible in either kernel or user mode (kuseg) and those that
are accessible only in kernel mode (kseg2:0).
Bits in a status register determine which virtual addressing mode will
be used. While in user mode, the RC32364 provides a single, uniform
2GB virtual address space for the user’s program. While operating in
kernel mode, four distinct virtual address spaces, totalling 4GB, are
simultaneously available and are differentiated by the high-order bits of
the virtual address.
The RC32364 reserves a small portion of the kernel address space
for on-chip resources. These resources include those used by the
Enhanced JTAG unit as well as registers used to configure the system
bus interface.
For fast virtual-to-physical address decoding, the RC32364 uses a
fully associative
translation lookaside buffer (TLB)
that maps 32
virtual pages to their corresponding physical addresses. The TLB is
organized as 16 pairs of even/odd entries mapping pages of sizes that
vary from 4kBytes to 16 MBytes into the 4GB physical address space.
To assist in controlling both the amount of mapped space and the
replacement characteristics of various memory regions, the RC32364
provides two mechanisms. First, the page size can be configured, on a
per entry basis, to map a page size of 4kB to 16MB (in multiples of 4). A
CP0 register is loaded with the mapping page size which is then entered
into the TLB when a new entry is written. Thus, operating systems can
provide special purpose maps; for example, a typical frame buffer can
be memory mapped with only one TLB entry.
The second mechanism controls the replacement algorithm, when a
TLB miss occurs. To select a TLB entry to be written with a new
mapping, the RC32364 provides a random replacement algorithm;
however, the processor provides a mechanism whereby a system
specific number of mappings can be locked into the TLB and thus avoid
being randomly replaced. This facilitates the design of real-time
systems, by allowing deterministic access to critical software.
The RC32364’s TLB also contains information to control the cache
coherency protocol for each page. Specifically, each page has attribute
bits to determine whether the coherency algorithm is uncached, non-
coherent write-back, or non-coherent write-through no write-allocate.
System Control Coprocessor (CP0)
In the MIPS architecture, the system control co-processor is respon-
sible for the virtual-to-physical address translation and cache protocols,
the exception control system, and the processor’s diagnostics capability.
Also, the system control co-processor (and thus the kernel software) is
implementation dependent.
Although the RISCore32300 implements a 32-bit ISA, the Memory
Management Unit (MMU) that the RC32364 incorporates is modeled
after the MMU found in the 64-bit RC5000 family and offers variable
page size, enhanced cache write algorithm support, mapping of a larger
portion of the virtual address space and a variable number of locked
entries, relative to the traditional 32-bit R3000 style MMU.
The RC32364’s translation lookaside buffer (TLB) contains 16
entries, mapping a total of 32 pages or as much as 512 MB of memory
at a time.
The exception model that is implemented in the RC32364 is also
consistent with that of the RC5000 family, including the treatment of
kernel mode and exception processing.
The RC32364 incorporates all
system control co-processor (CP0)
registers on-chip. These registers provide the path through which the
virtual memory system’s address translation is controlled, exceptions
are handled, and operating modes are selected (for example, kernel vs.
user mode, interrupts enabled or disabled, and cache features).
In addition, the RC32364 includes registers to implement a real-time
cycle counting facility, which aids in cache diagnostic testing, assists in
data error detection, and facilitates software debug. Alternatively, this
timer can be used as the operating system reference timer and can
signal a periodic interrupt.
3 of 21
*Notice: The information in this document is subject to change without notice
June 20, 2000
79RC32364™
This allows the system architect to allocate address space according to
the most efficient use of bus bandwidth. For example, stack data may be
accessed always as write-back, while packet data may be best
accessed as write through, for later DMA out to an I/O port.
The RC32364 cache controller works in conjunction with these
attributes, enabling an application to alias a region of physical memory
through multiple virtual spaces. The cache controller will also ensure
that regardless of which address space is used the current copy of data
will be provided when referenced, and it will further guarantee that the
cache is properly managed with respect to main memory.
Cache Memory
To keep the RC32364’s high-performance pipeline full and operating
efficiently, the RC32364 incorporates on-chip instruction and data
caches that can each be accessed in a single processor cycle. Each
cache has its own 32-bit data path and can be accessed in the same
pipeline clock cycle.
The RC32364 incorporates a two-way set associative on-chip
Instruction Cache.
This virtually indexed, physically tagged cache is
8kB in size and parity protected. Because this cache is virtually indexed,
the virtual-to-physical address translation occurs in parallel with the
cache access. The tag holds a 21-bit physical address, a valid bit, lock
bit, a parity bit, and the FIFO replacement bit.
For fast, single cycle data access, the RC32364 includes a 2kB on-
chip
data cache
that is two-way set associative with a fixed 16-byte
(four words) line size. The data cache is protected with byte parity and
its tag is protected with a single parity bit. It is virtually indexed and
physically tagged to allow simultaneous address translation and data
cache access.
The RC32364 supports a
cache-locking
feature to critical sections
of code and data into on-chip caches, to guarantee fast accesses. The
implementation of cache-locking is on a “per-line” basis, enabling the
system designer to maximize the efficiency of the system cache.
Writes to external memory—whether cache miss write-backs or
stores to uncached or write-through addresses—use the
on-chip write
buffer.
The write buffer holds a maximum of four address and data
pairs. The entire buffer is used for a data cache writeback and allows
the processor to proceed in parallel with a memory update.
Debug Support
To facilitate software debug, the RC32364 adds a pair of watch regis-
ters to CP0. When enabled, these registers will cause the CPU to take
an exception when a “watched” address is appropriately accessed.
In addition, the RC32364 implements an Enhanced JTAG interface,
which requires the inclusion of significant amounts of debug support
logic on-chip, facilitating the development of low-cost in-circuit emulation
equipment.
For low-cost In-Circuit Emulation, the RC32364 provides an
Enhanced JTAG interface.
This interface consists of two modes of
operation: Run-Time Mode and Real-Time Mode.
The Run-Time Mode provides a standard JTAG interface for on-chip
debugging, and the Real-Time Mode provides additional status pins—
PCST[2:0]—which are used in conjunction with JTAG pins for Real-Time
Trace information at the processor internal clock or any division of the
pipeline clock.
The RC32364 implements the traditional RC4000 model of
interrupt
processing.
However, this model has been enhanced to benefit real-
time systems.
To speed interrupt exception decoding, the RC32364 adds a sepa-
rate interrupt vector. Unlike the RC3000 family—which utilizes a single
common exception vector for all exception types (including interrupts)—
the RC32364 allows kernel software to enable a separate interrupt
exception vector.
When enabled, this vector location speeds interrupt processing by
allowing software to avoid decoding interrupts from general purpose
exceptions.
System interfaces
The RC32364 supports a 32-bit system interface, allowing the CPU
to interface with a lower cost memory system. The main features of the
system interface include:
x
Multiplexed address and data bus with Address Latch Enable
(ALE) signal to demultiplex the A/D bus.
x
Support of variable port widths, including boot device.
x
Support of multiple pipeline to system clock ratios, with the CPU
core frequency being derived from the input system clock.
x
Incorporation of a DMA arbiter, allowing an external master
control of the external bus.
The 32-bit
system address/data (A/D)
bus is used to transfer
addresses and data between the RC32364 and the rest of the system.
The ALE signal is provided to demultiplex the address from this bus.
The DATAEN* signal indicates the data phase of the A/D bus and DT/R*
indicates the direction of data flow. BE*[3:0] indicates the valid bytes on
the bus. Additional ADDR[3:2] provides incremental address during
burst transfers.
To indicate system interface bus activity, the RC32364 provides a
cycle-in-progress (CIP*) signal. The RD* and WR* signals indicate the
type of cycle in progress. And to terminate cycle in progress, the
Development Tools
An array of tools facilitate rapid development of RC32364-based
systems, allowing a wide variety of customers to take advantage of the
processor’s high-performance capabilities while maintaining short time-
to-market goals.
The RC32364 incorporates an enhanced JTAG debug interface. This
interface uses a small number of pins, combined with on-chip debug
support logic, to enable the development of low-cost in-circuit emulators
for high-speed IDT processors.
4 of 21
*Notice: The information in this document is subject to change without notice
June 20, 2000
79RC32364™
RC32364 also provides Ack*, Retry*, and BusErr* signals. This device
also provides I/D* signals, to indicate whether instructions or data is
being transferred. The Last* signal is provided to indicate that the last
data transfer is in progress.
The RC32364 provides six
external interrupt signals:
INT*[5:0] and
a non-maskable interrupt (NMI*) signal.
To share the system interface bus, the RC32364 provides BusReq*
and BusGnt* signals to interface
external DMA masters.
To allow the
external master control of the external bus, a DMA arbiter is provided.
The RC32364 supports a
variable bus width interface,
enabling the
CPU to operate with a mix of 8-bit, 16-bit, and 32-bit wide memories.
To indicate the width of the memory or I/O space being accessed, the
RC32364 provides two output signals, Width[1:0]. The width of various
address spaces is programmed using the Port Width Control Register.
The RC32364’s physical memory is divided into several regions, and
each region’s width can be programmed by using this register. Within
these regions, the bus turnaround time can also be programmed.
Thus, the RC32364 can be simply mated with low-cost external
memory subsystems. The large on-chip caches and the early restart
serve to allow the processor to achieve high-performance even with
such low-cost memory.
The RISCore32300 offers a number of features relevant to low-
power systems, including low-power design, active
power manage-
ment
and power-down modes of operation. The RISCore32300 is a
static design. The RC32364 supports a WAIT instruction which is
designed to signal the rest of the chip that execution and clocking should
be halted, reducing system power consumption during idle periods.
Typical values for
∅
CA
at various airflows are shown in Table 2 Note
that the RC32364 implements advanced power management, which
substantially reduces the average power dissipation of the device.
∅
CA
Airflow (ft/min)
144 TQFP
0
27
200
22
400
20
600
17
800
15
1000
14
Table 2 Thermal Resistance (∅CA) at Various Airflows
History
Revision Histor y
August 1999:
Changed references from MIPS-II to MIPS 32.
Changed references from MIPS-IV to MIPS 64. Changed values in
Clock Parameters Table, System Interface Parameters Table, and
Power Consumption Table. Deleted Several Timing Diagrams. Added
JTAG TIming Diagram.
Jan. 12, 2000:
Corrected information regarding the TRST* signal in
Table 3. TRST* requires an external pull-down on the board.
April 4, 2000:
Adjusted values for DCLK in the System Interface
Parameters table. Added Power Curves.
June 20, 2000:
Changed times for the Data Output Hold, TDO
Output Delay Time, and TPC Output Delay Time parameters in the
System Interface Parameters table. Revised values for PCST Output
Delay Time in System Interface Parameters table.
Thermal Considerations
The RC32364 is a low-power CPU, consuming approximately 0.9W
peak power. Thus, no special packaging considerations are required.
The RC32364 is guaranteed in a case temperature range of 0° to
+85° C, for commercial temperature devices; - 40° to +85° for industrial
temperature devices. The type of package, speed (power) of the device,
and airflow conditions affect the equivalent ambient temperature condi-
tions that will meet this specification.
The equivalent allowable ambient temperature, T
A
, can be calculated
using the thermal resistance from case to ambient (∅
CA
) of the given
package. The following equation relates ambient and case tempera-
tures:
T
A
= T
C
- P *
∅
CA
where P is the maximum power consumption at hot temperature,
calculated by using the maximum I
CC
specification for the device.
5 of 21
*Notice: The information in this document is subject to change without notice
中国 北京—— Analog Devices, Inc. (纽约证券交易所代码 : ADI ),全球领先的高性能信号处理解决方案供应商,最新推出两款模拟输出驱动器—— AD5750 和 AD5751 ,能显著提高过程控制应用的效率和可靠性,包括那些在高恒流电压和高温条件下工作的过程控制应用。这两款器件基于 ADI 公司的 iCMOS ™工业工艺技术,输出驱动器的精度...[详细]