SIMD-VITERBI(3)SIMD-VITERBI(3)NAME
create_viterbi27, init_viterbi27, update_viterbi27, chain‐
back_viterbi27, delete_viterbi27, create_viterbi29, init_viterbi29,
update_viterbi29, chainback_viterbi29, delete_viterbi29 - IA32 SIMD-
assisted Viterbi decoders
SYNOPSIS
#include "viterbi27.h"
void *create_viterbi27(int blocklen);
int init_viterbi27(void *vp,int starting_state);
int update_viterbi27(void *vp,unsigned char sym1,unsigned char sym2);
int chainback_viterbi27(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi27(void *vp);
void emms_viterbi27(void);
extern char id_viterbi27[];
#include "viterbi29.h"
void *create_viterbi29(int blocklen);
int init_viterbi29(void *vp,int starting_state);
int update_viterbi29(void *vp,unsigned char sym1,unsigned char sym2);
int chainback_viterbi29(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi29(void *vp);
void emms_viterbi29(void);
extern char id_viterbi29[];
DESCRIPTION
These functions implement high performance Viterbi decoders for two
convolutional codes: a rate 1/2 constraint length 7 (k=7) code
("viterbi27") and a rate 1/2 k=9 code ("viterbi29"). The decoders use
the Intel IA32 SIMD instruction sets, if available, to improve perfor‐
mance.
There are three different IA32 SIMD instruction sets. The most common
is MMX, first implemented on later Intel Pentiums and then on the Intel
Pentium II and most Intel clones (AMD K6, Transmeta Crusoe, etc). SSE
was introduced on the Pentium III and later implemented in the AMD
Athlon 4 (AMD calls it "3D Now! Professional"). Most recently, SSE2 was
introduced in the Intel Pentium 4. As of late 2001, there are no other
known implementations of SSE2.
Four separate static libraries implement the decoders for the four dif‐
ferent instruction sets. -lviterbi_port uses no SIMD instructions; it
is intended for pre-MMX IA32 machines and for non-IA32 machines.
-lviterbi_mmx is for IA-32 machines that support the MMX instructions;
-lviterbi_sse is for machines with the SSE instructions, and
-lviterbi_sse2 is for machines with SSE2 support. The function names
and calling conventions are the same for all four versions, although
the size of certain internal data structures are different.
A shared library, -lviterbi is also provided; it is assumed to refer to
the correct version for the current machine.
USAGE
Two versions of each function are provided, one for the k=7 code and
another for the k=9 code. In the following discussion the k=7 code will
be assumed. To use the k=9 code, simply change all references to
"viterbi27" to "viterbi29".
Before Viterbi decoding can begin, an instance must first be created
with create_viterbi27(). This function creates and returns a pointer
to an internal control structure containing the path metrics and the
branch decisions. create_viterbi27() takes one argument that gives the
length of the data block in bits. You must not attempt to decode a
block longer than the length given to create_viterbi27().
After a decoder instance is created, and before decoding a new frame,
init_viterbi27() must be called to reset the decoder state. It accepts
the instance pointer returned by create_viterbi27() and the initial
starting state of the convolutional encoder (usually 0). If the initial
starting state is unknown or incorrect, the decoder will still function
but the decoded data may be incorrect at the start of the block.
Each pair of received symbols is processed with a call to
update_viterbi27(). Each symbol is expected to range from 0 through
15, with 0 corresponding to a "strong 0" and 15 corresponding to a
"strong 1". The caller is responsible for determining the proper pair‐
ing of input symbols (commonly known as decoder symbol phasing).
At the end of the block, the data is recovered with a call to chain‐
back_viterbi27(). The arguments are the pointer to the decoder
instance, a pointer to a user-supplied buffer into which the decoded
data is to be written, the number of data bits (not bytes) that are to
be decoded, and the terminal state of the convolutional encoder at the
end of the frame (usually 0). If the terminal state is incorrect or
unknown, the decoded data bits at the end of the frame may be unreli‐
able. The decoded data is written in big-endian order, i.e., the first
bit in the frame is written into the high order bit of the first byte
in the buffer. If the frame is not an integral number of bytes long,
the low order bits of the last byte in the frame will be unused.
Note that the decoders assume the use of a tail, i.e., the encoding and
transmission of a sufficient number of padding bits beyond the end of
the user data to force the convolutional encoder into the known termi‐
nal state given to chainback_viterbi27(). The k=7 code uses 6 tail bits
(12 tail symbols) and the k=9 code uses 8 tail bits (16 tail symbols).
The tail bits are not included in the length arguments to cre‐
ate_viterbi27() and chainback_viterbi27(). For example, if the block
contains 1000 user bits, then this would be the length parameter given
to create_viterbi27() and chainback_viterbi27(), and update_viterbi27()
would be called a total of 1006 times - the last 6 with the 12 encoded
symbols representing the tail bits.
After the call to chainback_viterbi27(), the decoder may be reset with
a call to init_viterbi27() and another block can be decoded. Alterna‐
tively, delete_viterbi27() can be called to free all resources used by
the Viterbi decoder.
The MMX and SSE versions of the decoder use registers aliased onto the
Intel floating point registers, so you must insert calls to
emms_viterbi27() between calls to update_viterbi27() and any subsequent
floating point computations in your program. You need not do this after
every call to update_viterbi27() if you perform floating point only
after the end of the frame. In this case you may defer the call to
emms_viterbi27() until after chainback_viterbi27() has been called.
emms_viterbi27() is a no-op in the portable and SSE2 versions of the
decoder, so you can safely call it regardless of library version. (The
SSE2 version uses the XMM registers, which do not interfere with the
X87 floating point stack. Hence emms calls are not necessary with this
version.)
The global character string id_viterbi27[] identifies the decoder ver‐
sion in use.
RETURN VALUEScreate_viterbi27() returns a pointer to the structure containing the
decoder state. update_viterbi27() returns the amount by which the
decoder path metrics were normalized in the current step. Only the por‐
table C, SSE and SSE2 versions perform normalization; the MMX version
uses modulo arithmetic.
AUTHOR
Phil Karn, KA9Q (karn@ka9q.net)
SIMD-VITERBI(3)