Mikroskop Technik Rathenow RMA 5 pol User Manual download pdf (Page 11)

Forschungsbericht der FH Brandenburg Teil I: Wissenschaftliche Beiträge

Parallel Memory Architectures for Video Encoding Systems,

Part II: Applications

Reiner Creutzburg

, Michael Gössel

, Jarkko Niittylahti

, Tero Sihvo

, Jarmo Takala

, and Jarno Tanskanen

Fachhochschule Brandenburg – University of Applied Sciences, Fachbereich Informatik und Medien, P.O.Box 2132,

D – 14737 Brandenburg, Germany;

Tampere University of Technology, Digital and Computer Systems Laboratory, P.O.Box 553, FIN-33101 Tampere, Finland;

Universität Potsdam, Institut für Informatik, August-Bebel-Str. 89, D – 14482 Potsdam, Germany

ABSTRACT:

In this paper we apply the theory Parallel Memory Architectures given

in part I [11, 12] to develop a parallel architecture for H.263 video en-

coding, consisting of SIMD type connected on-chip parallel proces-

sors and parallel memory architecture.

3. PARALLEL MEMORY ARCHITECTURES FOR VIDEO CODING

Many of the video coding operations require high computing power

and high memory bandwidth. One way to increase the processing

power is to use multiple or parallel processors. In parallel processing,

the increased number of executed operations per time unit leads to in-

creased number of operands. Typically, correct operands have to be

in the register file of the processor before the operation is executed.

Usually, the number of registers is small. Thus the load / store opera-

tions, which take care of the data transfers between the register file

and the data memory, are frequently needed. This results in the need

for a considerable bandwidth between the processor and the data

memory. The bandwidth can be increased by accessing several op-

erands simultaneously, which requires several independent memory

banks, a parallel memory. Subsequent memory operations usually

concentrate to a small memory address range or picture region of

interest. Examples of such areas are macroblock and search area in

motion estimation. The data blocks processed at a time typically fit

into the internal parallel memory.

The parallel processors, the control processor, and the variable length

coding (VLC) processor used in the proposed architecture, could be

completely similar, simple, small, and low-powered DSP cores. This

may significantly speed up the implementation process compared to

the application-specific integrated circuit (ASIC) design, where the

dedicated block for each operation has to be realized. There are many

parallel processor implementations suitable for image and video

processing, e.g., highly parallel DSP (HiPAR) with four or sixteen par-

allel data paths [1,2], highly parallel single-chip video DSP with four

parallel processing units [3], and parallel DSP for mobile multimedia

processing with four parallel data paths [4].

The needed processing power varies, e.g., according to the optimiza-

tion level of the code, picture resolution, frame rate, the algorithms,

and optional coding modes. A similar kind of architecture might also

be used for decoding. Thus, the application could be encoder, de-

coder, or both of them. The required computation power and the

number of the parallel processors is determined accordingly.

3.1 The Proposed Parallel Architecture

In our case, there are two dual-ported data memory modules per par-

allel processor. One port is intended for the parallel processors and

the other is reserved for direct memory access (DMA) controller, con-

trol and VLC processors, so that they could operate concurrently with

the parallel processing. All processors have access to the common

data by using these, more or less, application specific access for-

mats. This realization provides very high bandwidth since the parallel

processors can access two memory operations per clock cycle and

per processor according to access formats. In addition, the number

of the required memory locations is reasonable, because there is no

need to keep the same data in different memory modules. However,

extra logic is needed for the address calculation, full-crossbars be-

tween the parallel processors and the parallel memory, and intercon-

nection networks between the control, DMA, and VLC processors and

the parallel memory. Even if the bandwidth of the additional port of

the parallel memory modules is not fully utilized, the usage of it may

significantly shorten the processing time compared to the situation

where, e.g., the operations processed in parallel and the DMA opera-

tions are performed sequentially. On the other hand, if the processing

time requirement is not so strict, parallel dual-ported memory mod-

ules might be replaced by single-ported memory modules and save

some silicon area.

In the parallel architecture shown in Figure 3.1, there are N + 2 DSP

processor cores : N parallel processors, a control processor, and a

VLC processor. The parallel processors are marked as DSP

, DSP

DSP

, and DSP

N-1

, when N = 2,4,8,16, or 32 is the number of the paral-

lel processors. It is assumed that the parallel processors are small,

low-power DSPs and are able to load two operands in parallel. When

the processing power requirements increase or the number of the

parallel processors is decreased, the instruction set for the parallel

processors can be optimized. Also, the instruction set for the VLC

processor can be optimized for VLC coding. Alternatively, one can

replace the VLC processor with a dedicated VLC core. On the other

hand, with the lower computational power requirements, VLC coding

could be performed by the control processor, and the VLC processor

could be removed from the design.

The parallel processors are connected to the parallel memory using

two N-ported, bi-directional crossbars. The parallel memory consists

of the memory modules M

to M

2N-1

. Half of the memory modules, from

to M

N-1

, have 16-bit memory locations and another half, from M

1 2 ... 6 7 8 9 10 11 12 13 14 15 16 ... 149 150

Comments to this Manuals

No comments

Mikroskop Technik Rathenow RMA 5 pol User Manual Page 11

Comments to this Manuals