Mikroskop Technik Rathenow RMA 5 pol User Manual Page 11

  • Download
  • Add to my manuals
  • Print
  • Page
    / 150
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 10
11
Forschungsbericht der FH Brandenburg Teil I: Wissenschaftliche Beiträge
Parallel Memory Architectures for Video Encoding Systems,
Part II: Applications
Reiner Creutzburg
1
, Michael Gössel
3
, Jarkko Niittylahti
2
, Tero Sihvo
2
, Jarmo Takala
2
, and Jarno Tanskanen
2
1
Fachhochschule Brandenburg – University of Applied Sciences, Fachbereich Informatik und Medien, P.O.Box 2132,
D – 14737 Brandenburg, Germany;
2
Tampere University of Technology, Digital and Computer Systems Laboratory, P.O.Box 553, FIN-33101 Tampere, Finland;
3
Universität Potsdam, Institut für Informatik, August-Bebel-Str. 89, D – 14482 Potsdam, Germany
ABSTRACT:
In this paper we apply the theory Parallel Memory Architectures given
in part I [11, 12] to develop a parallel architecture for H.263 video en-
coding, consisting of SIMD type connected on-chip parallel proces-
sors and parallel memory architecture.
3. PARALLEL MEMORY ARCHITECTURES FOR VIDEO CODING
Many of the video coding operations require high computing power
and high memory bandwidth. One way to increase the processing
power is to use multiple or parallel processors. In parallel processing,
the increased number of executed operations per time unit leads to in-
creased number of operands. Typically, correct operands have to be
in the register file of the processor before the operation is executed.
Usually, the number of registers is small. Thus the load / store opera-
tions, which take care of the data transfers between the register file
and the data memory, are frequently needed. This results in the need
for a considerable bandwidth between the processor and the data
memory. The bandwidth can be increased by accessing several op-
erands simultaneously, which requires several independent memory
banks, a parallel memory. Subsequent memory operations usually
concentrate to a small memory address range or picture region of
interest. Examples of such areas are macroblock and search area in
motion estimation. The data blocks processed at a time typically fit
into the internal parallel memory.
The parallel processors, the control processor, and the variable length
coding (VLC) processor used in the proposed architecture, could be
completely similar, simple, small, and low-powered DSP cores. This
may significantly speed up the implementation process compared to
the application-specific integrated circuit (ASIC) design, where the
dedicated block for each operation has to be realized. There are many
parallel processor implementations suitable for image and video
processing, e.g., highly parallel DSP (HiPAR) with four or sixteen par-
allel data paths [1,2], highly parallel single-chip video DSP with four
parallel processing units [3], and parallel DSP for mobile multimedia
processing with four parallel data paths [4].
The needed processing power varies, e.g., according to the optimiza-
tion level of the code, picture resolution, frame rate, the algorithms,
and optional coding modes. A similar kind of architecture might also
be used for decoding. Thus, the application could be encoder, de-
coder, or both of them. The required computation power and the
number of the parallel processors is determined accordingly.
3.1 The Proposed Parallel Architecture
In our case, there are two dual-ported data memory modules per par-
allel processor. One port is intended for the parallel processors and
the other is reserved for direct memory access (DMA) controller, con-
trol and VLC processors, so that they could operate concurrently with
the parallel processing. All processors have access to the common
data by using these, more or less, application specific access for-
mats. This realization provides very high bandwidth since the parallel
processors can access two memory operations per clock cycle and
per processor according to access formats. In addition, the number
of the required memory locations is reasonable, because there is no
need to keep the same data in different memory modules. However,
extra logic is needed for the address calculation, full-crossbars be-
tween the parallel processors and the parallel memory, and intercon-
nection networks between the control, DMA, and VLC processors and
the parallel memory. Even if the bandwidth of the additional port of
the parallel memory modules is not fully utilized, the usage of it may
significantly shorten the processing time compared to the situation
where, e.g., the operations processed in parallel and the DMA opera-
tions are performed sequentially. On the other hand, if the processing
time requirement is not so strict, parallel dual-ported memory mod-
ules might be replaced by single-ported memory modules and save
some silicon area.
In the parallel architecture shown in Figure 3.1, there are N + 2 DSP
processor cores : N parallel processors, a control processor, and a
VLC processor. The parallel processors are marked as DSP
0
, DSP
1
,
DSP
2
, and DSP
N-1
, when N = 2,4,8,16, or 32 is the number of the paral-
lel processors. It is assumed that the parallel processors are small,
low-power DSPs and are able to load two operands in parallel. When
the processing power requirements increase or the number of the
parallel processors is decreased, the instruction set for the parallel
processors can be optimized. Also, the instruction set for the VLC
processor can be optimized for VLC coding. Alternatively, one can
replace the VLC processor with a dedicated VLC core. On the other
hand, with the lower computational power requirements, VLC coding
could be performed by the control processor, and the VLC processor
could be removed from the design.
The parallel processors are connected to the parallel memory using
two N-ported, bi-directional crossbars. The parallel memory consists
of the memory modules M
0
to M
2N-1
. Half of the memory modules, from
M
0
to M
N-1
, have 16-bit memory locations and another half, from M
N
to
Page view 10
1 2 ... 6 7 8 9 10 11 12 13 14 15 16 ... 149 150

Comments to this Manuals

No comments