WFDB (WaveForm DataBase)
External format for physiologic time-series records developed at the MIT Laboratory for Computational Physiology and used as the native storage format of PhysioNet / PhysioBank. A record is not a single file; it is a set of files that share a record name and are distinguished by extension. There is no magic byte and no embedded record manifest — the catalog of files is the directory itself.
A WFDB record has up to four file kinds:
- Header file — text, suffix
.hea. Required (except for EDF/BDF records, which the WFDB library accepts as drop-ins). Carries the record-level metadata: sampling frequency, signal count, signal count per signal file, per-signal format and scaling, optional start time/date, optional segment list, and arbitrary trailing info strings. - Signal file(s) — binary, conventionally suffix
.dat. Carry the samples. One signal file may hold several signals interleaved (channel-multiplexed), and a record may use any number of signal files. Naming and location are dictated by the header file. - Annotation file(s) — binary, suffix is the annotator name
(e.g.
.atrfor reviewed reference annotations,.qrsfor an automatic QRS detector). Carry per-sample labels. - Calibration file — text, not record-bound. One file per
installation describes calibration-pulse semantics for signal
types (ECG, ABP, Resp, …) and is referenced through the
WFDBCALenvironment variable.
The reference texts are the PhysioNet WFDB Applications Guide manual pages at https://physionet.org/physiotools/wag/header-5.htm, https://physionet.org/physiotools/wag/signal-5.htm, https://physionet.org/physiotools/wag/annot-5.htm, and https://physionet.org/physiotools/wag/wfdbca-5.htm, with the chapter on file types in the WFDB Programmer’s Guide at https://physionet.org/physiotools/wpg/wpg_39.htm.
File identification
Section titled “File identification”WFDB has no magic bytes. Detection is structural:
- Extension —
.heaidentifies the header. A directory containing one or more.heafiles is a WFDB record set. Signal files conventionally use.dat; annotation files use an installation-defined annotator suffix. - Record discovery — given a record name
R, the WFDB library opensR.hea, reads the signal file names from inside, and locates those files relative to the WFDB path (theWFDBenvironment variable, an ordered list of directories). - Companion records — a directory may host many records side by
side. A collection-level
RECORDSfile may enumerate them, but the WFDB library does not require it; reading is record-name driven.
The record name itself never includes the .hea suffix. A name that
contains / denotes a multi-segment record where the integer after
the / is the segment count.
Header file
Section titled “Header file”The header is a line-oriented ASCII text file. Lines are separated by LF, optionally preceded by CR (WFDB 6.1+ accepts CRLF). No line may exceed 255 bytes including the terminator.
- Comments — any line whose first printable character is
#is a comment. Comments may appear anywhere. Comment lines that come after the last signal specification line are reserved for info strings (see below); other comments are ignored by the WFDB library. - Empty lines are ignored.
- Fields within a non-comment line are separated by spaces or
tabs, except for the compound fields described below where a
specific delimiter (
/,x,:,+, parentheses) is bound to a field with no surrounding whitespace. - Numbers are read with the C standard
scanfrules for the underlying type, so360,360.,360.0, and3.6e2are all legal and equivalent for a floating-point field.
Record line
Section titled “Record line”The first non-empty, non-comment line is the record line. It
describes the record as a whole. Fields, left to right, with [opt]
marking optional fields:
| # | Field | Type | Form / delimiter |
|---|---|---|---|
| 1 | record_name | string | [A-Za-z0-9_]+ |
| 1a | segment_count [opt] | uint | record_name/segment_count |
| 2 | signal_count | uint | space-separated |
| 3 | sampling_frequency [opt] | float | space-separated |
| 3a | counter_frequency [opt] | float | sampling_frequency/counter_frequency |
| 3b | base_counter_value [opt] | float | (value) |
| 4 | samples_per_signal [opt] | uint | space-separated |
| 5 | base_time [opt] | string | HH:MM:SS |
| 6 | base_date [opt] | string | DD/MM/YYYY |
Each optional field is admissible only when every previous optional field is also present.
record_name— identifier. Allowed characters are ASCII letters, digits, and underscore.segment_count— present only when the record name carries a/nsuffix; signals that the file is a multi-segment record and that the lines that follow the record line are segment specification lines rather than signal specification lines. The count must be positive; the value1is legal but unusual.signal_count— number of signals in the record. May be zero (used by annotation-only records and by the layout segment of a variable-layout multi-segment record). It is not the number of signal files: several signals may share one file (a signal group).sampling_frequency— in samples per second per signal. If absent, the value defaults toDEFREQ = 250.0. Must be strictly positive.counter_frequency— ticks per second of an external counter (e.g. analog tape counter, chart-recording page index). Used bystrtimto convert counter-based time strings (c123) into sample indices. Defaults tosampling_frequencywhen absent or non-positive.base_counter_value— counter reading that corresponds to sample index 0. Defaults to0.samples_per_signal— length of the record in samples per signal. If zero or absent, the record length is unspecified and checksum verification is disabled.base_time— wall-clock time at which sample 0 was recorded, 24-hour clock, e.g.13:05:00(or13:5:0). Defaults to00:00:00.base_date— date at which sample 0 was recorded, day-month-year, e.g.25/4/1989for 25 April 1989.
Signal specification line
Section titled “Signal specification line”In a single-segment record (one without a segment_count), every
non-empty, non-comment line after the record line is a signal
specification line. The lines appear in signal order, starting from
signal 0. The header must contain at least signal_count of these;
extra trailing lines are not read.
Each line, left to right:
| # | Field | Type | Form / delimiter |
|---|---|---|---|
| 1 | signal_file_name | string | absolute / relative path, or - (stdin/stdout) |
| 2 | format | uint | space-separated; one of the codes in Signal formats |
| 2a | samples_per_frame [opt] | uint | formatxN (bound to format, prefix x) |
| 2b | skew [opt] | uint | format:S (bound to format, prefix :) |
| 2c | byte_offset [opt] | uint | format+B (bound to format, prefix +) |
| 3 | adc_gain [opt] | float | space-separated |
| 3a | baseline [opt] | int | adc_gain(baseline) |
| 3b | units [opt] | string | adc_gain/units (no whitespace) |
| 4 | adc_resolution [opt] | uint | space-separated, bits |
| 5 | adc_zero [opt] | int | space-separated |
| 6 | initial_value [opt] | int | space-separated |
| 7 | checksum [opt] | int | space-separated, signed 16-bit |
| 8 | block_size [opt] | int | space-separated, bytes |
| 9 | description [opt] | string | rest of line; may contain spaces |
The format modifiers x, :, + are bound to the format field —
no whitespace is allowed between the format integer and the
modifier. Multiple modifiers are concatenated in any order, each with
its own prefix character (16x2:30+1024).
Field-by-field detail follows.
signal_file_name
Section titled “signal_file_name”Path to the binary signal file holding this signal’s samples. The
WFDB library resolves the name against the WFDB path: a leading
empty path component makes absolute paths usable as-is, and a
directory not yet in WFDB is appended to it (WFDB 6.2+).
- The special name
-means standard input (when reading) or standard output (when writing). - The special name
~is reserved for the layout segment of a variable-layout multi-segment record and never refers to a real file. - Multiple signals may share one file; their signal specification lines must then be consecutive (and form a signal group).
- The byte sum of
signal_file_nameanddescriptionis capped at 80 characters.
format
Section titled “format”Integer code that selects the encoding of samples on disk. The codes defined by the WFDB library are listed in Signal formats; the most common in PhysioBank are format 212 (12-bit pair-packed, MIT-BIH legacy), format 16 (plain 16-bit two’s complement), and format 8 (8-bit first differences). Within one signal group every signal must use the same format.
The samples_per_frame, skew, and byte_offset modifiers attach
to this field directly with no separating whitespace, each marked by
its own prefix character.
samples_per_frame
Section titled “samples_per_frame”Bound to format with the prefix character x. Default value is
1. A value N > 1 declares the signal as oversampled: it
contributes N samples to every record frame (see
Multi-frequency records) and was
digitized at N * sampling_frequency samples per second.
Non-integer multipliers are not supported. WFDB versions <= 8.3
ignore this field and cannot read oversampled records correctly.
Bound to format with the prefix character :. Default value is
0. A positive integer S declares that this signal’s stream
precedes sample 0 by S samples relative to the rest of the
record (because of azimuth mismatch in a multi-track analog tape,
calibration offset, etc.). Those leading S samples are included in
the signal’s checksum but are not returned by getvec / getframe.
Editing the skew alone never changes the checksum. WFDB versions
<= 9.1 ignore this field.
byte_offset
Section titled “byte_offset”Bound to format with the prefix character +. Default value is
0. A signal file containing a preamble (header bytes prepended by
a non-WFDB writer) sets byte_offset to the byte length of that
preamble; sample 0 begins at that offset. All signals in the same
signal group must declare the same offset. The preamble is excluded
from the checksum. The WFDB library only reads such files; it never
writes them. WFDB versions <= 4.4 ignore byte offsets and return
preamble data as if it were samples.
adc_gain
Section titled “adc_gain”Floating-point number expressing the ADC’s slope as ADC units per
physical unit. A step of one physical unit at the analog input
produces an output that differs by adc_gain ADC units. For an ECG
this is roughly the R-wave amplitude in a lead aligned with the
cardiac axis. Zero or absent means the signal is uncalibrated; the
library substitutes DEFGAIN = 200.0 ADC units per physical unit.
baseline
Section titled “baseline”Bound to adc_gain with parentheses. Integer that names the ADC
sample value corresponding to a physical reading of 0. Defaults to
adc_zero. The baseline does not have to fall inside the ADC range
— a temperature sensor mapped to 200..300 K can place its baseline
well below digital_min because 0 K lies outside the represented
range. WFDB versions <= 5.0 ignore this field.
Bound to adc_gain (after the optional baseline) with the prefix
character /. String without embedded whitespace naming the
physical unit of the signal: mV, mmHg, degC, l, …
Defaults to mV when absent. WFDB versions <= 4.7 ignore this
field.
adc_resolution
Section titled “adc_resolution”Bits of resolution of the analog-to-digital converter, typically
8..16. If absent or zero, the default is 12 for amplitude formats
and 10 for difference formats (format 8). Some formats imply a
lower resolution and override the default (e.g. 12 for format 212,
10 for formats 310 and 311).
adc_zero
Section titled “adc_zero”Sample value that the ADC would produce for an input exactly in the
middle of its range. For a bipolar converter this is 0; for a
unipolar (offset-binary) converter it is the midpoint, e.g. 1024
for an 11-bit unipolar ADC. Together with adc_resolution this
fixes the range of legal sample values. Defaults to 0.
initial_value
Section titled “initial_value”Sample value at index 0. Used only by difference-coded formats
(format 8) to seed the cumulative sum. Defaults to adc_zero.
checksum
Section titled “checksum”Signed 16-bit checksum of the reconstructed sample stream of this
signal. The checksum is computed on the decoded samples, not on the
on-disk bytes, so it does not change when a signal is reformatted.
It is verified only when the full record is read from start to end
and samples_per_signal is known. 0 is also used as a placeholder
when samples_per_signal is unspecified.
block_size
Section titled “block_size”Block size in bytes for reading the signal file. Almost always 0.
Non-zero values are reserved for character special files (raw tape
or disk devices) where I/O must happen in fixed-size blocks. A
negative value flags the file as not seekable by fseek. All
signals in a signal group share the same block size.
description
Section titled “description”Free-form text identifying the signal: lead name, sensor, body site,
etc. May include embedded spaces; runs to the end of the line.
Whitespace separating it from block_size is not part of the
description. When the description is missing, the library
synthesizes "record R, signal n". Conventional ECG descriptions
include MLII, V1…V6, ABP, Resp. The combined byte length
of signal_file_name and description cannot exceed 80 characters.
Info strings
Section titled “Info strings”Comment lines that follow the last signal specification line are not
discarded. They are exposed through getinfo and putinfo as info
strings: each line’s content after the leading # is one info
string. No whitespace may precede the # of an info-string line.
The convention used in PhysioBank for subject metadata is
# <age>: 35 <sex>: M <diagnoses>: (none) <medications>: (none)Info strings are not defined for the top-level header of a multi-segment record.
Multi-segment records
Section titled “Multi-segment records”A multi-segment record concatenates several ordinary records along
the time axis. It is identified by the /N suffix on the record
name in the record line, where N is the segment count. After the
record line the header carries N segment specification lines
instead of signal specification lines:
| # | Field | Type |
|---|---|---|
| 1 | record_name | string |
| 2 | samples_per_signal | uint |
The segment record name must denote an ordinary (single-segment) record sitting next to the top-level header in the same directory or on the WFDB path. Each segment must declare its sample count in its own header.
Two flavours:
- Fixed-layout — all segments share the same signal arrangement, gain, baseline, units, ADC resolution and zero, and description. Storage formats may still differ from segment to segment, allowing per-segment compression choices.
- Variable-layout — relaxed constraints. The first segment
(segment 0) is a layout segment: an ordinary record with a length
of 0 samples whose only role is to declare the desired final
arrangement of signals, gains, and baselines. A layout segment has
no signal file; its signal specification lines use
~as the file name. When read with WFDB 10.3.17 or later, the library scales, shifts, reorders, and zero-pads each subsequent segment to match the layout segment.
Segments may not nest. A segment specification line whose record name
is ~ denotes a null segment; reading such a segment yields the
sentinel value WFDB_INVALID_SAMPLE for every position and no signal
or header files are opened.
Examples
Section titled “Examples”MIT-BIH record 100 (two interleaved ECG signals in format 212, one signal file, 30 minutes at 360 Hz):
100 2 360 650000 0:0:0 0/0/0100.dat 212 200 11 1024 995 -22131 0 MLII100.dat 212 200 11 1024 1011 20052 0 V5# 69 M 1085 1629 x1# Aldomet, InderalAHA DB record 7001 (two ECG signals in format 8, each in its own absolute-path signal file, 250 Hz, 10-bit ADC):
7001 2 250 525000/db1/data0/d0.7001 8 100 10 0 -53 -1279 0 ECG signal 0/db1/data1/d1.7001 8 100 10 0 -69 15626 0 ECG signal 1Local record 8l (16 signals in format 8, file names looked up via
the WFDB path):
8l 16data0 8data1 8data2 8...data15 8Piped record 16x4 (four 16-bit signals streamed through standard
I/O):
# Piped record 16x4. Use this record to read or write 4 signals# using the standard I/O.16x4 4- 16- 16- 16- 16ahatape (two 16-bit signals streamed from a raw 9-track tape with
4096-byte blocks):
# Use this record on a UNIX system to read directly# from a 9-track AHA DB distribution tape with# 4096-byte blocks. The tape must be positioned# to the beginning of the ECG data file before# using this record.ahatape 2 250/dev/nrmt0 16 0 12 0 0 0 4096/dev/nrmt0 16 0 12 0 0 0 4096Multi-segment record multi (three segments, mixed formats):
multi/3 2 360 45000100s 21600null 1800100s 21600The total length 45000 equals the sum of the segment lengths
(21600 + 1800 + 21600). The middle segment null is itself an
ordinary record built from format-0 (null) signals.
Signal formats
Section titled “Signal formats”The format field of a signal specification line selects how
samples are laid out in the signal file. All multi-byte fixed-width
integer formats use signed two’s complement. Endianness varies by
format and is called out below. All formats can be used in
multiplexed signal files, where samples from the file’s signal group
are interleaved sample by sample (see
Multiplexed signal files).
| Code | Width | Endian | Coding | Notes |
|---|---|---|---|---|
| 0 | — | — | null | placeholder, all samples decoded as zero |
| 8 | 8 | — | signed first diff | 1 byte per sample, requires initial_value |
| 16 | 16 | LE | two’s complement | most common amplitude format |
| 24 | 24 | LE | two’s complement | WFDB 10.5.0+ |
| 32 | 32 | LE | two’s complement | WFDB 10.5.0+ |
| 61 | 16 | BE | two’s complement | ”big-endian 16” |
| 80 | 8 | — | offset binary | sample minus 128 is the signed value |
| 160 | 16 | LE | offset binary | sample minus 32768 is the signed value |
| 212 | 12 | LE | two’s complement | two samples per 3 bytes, bit-packed |
| 310 | 10 | LE | two’s complement | three samples per two 16-bit words |
| 311 | 10 | LE | two’s complement | three samples per one 32-bit word |
| 508 | 8 | — | FLAC compressed | up to 8 channels, lossless |
| 516 | 16 | — | FLAC compressed | up to 8 channels, lossless |
| 524 | 24 | — | FLAC compressed | up to 8 channels, lossless |
WFDB does not store its own per-sample bit width separately from the
format code — the format implies the width. The adc_resolution
field describes the converter, not the on-disk bit width, and is
used only for digital-to-physical scaling and range reporting.
Format 0 (null)
Section titled “Format 0 (null)”No on-disk storage. Every sample reads as zero. Used as a filler in multi-segment records and for placeholder signals.
Format 8
Section titled “Format 8”Each sample is an 8-bit signed first difference, one byte per
sample. The reconstructed sample value at index n is
x[n] = initial_value + sum(b[0..n])where b[i] is the signed byte read at file position i (relative
to byte_offset).
When a writer cannot encode a difference in 8 bits (the slew rate
would exceed +-127 LSB per sample), it emits the largest legal
difference of the right sign (-128 or +127) and continues
adjusting the next bytes so that the running sum reaches the true
sample as fast as possible. Encoding through format 8 is therefore
lossy if the source has steep transients.
In a multiplexed format-8 file, the first difference is taken between two consecutive samples of the same signal, not between adjacent bytes in the file. Otherwise two interleaved channels whose baselines differ by more than 128 ADC units could not be represented at all.
initial_value is mandatory: the reader needs the seed sample to
unroll the cumulative sum.
Format 16
Section titled “Format 16”Each sample is a 16-bit signed integer in little-endian two’s
complement, least significant byte first. The most common amplitude
format. Historically the format used for MIT-BIH and AHA database
distribution on 9-track tapes also added a logical EOF marker (octal
0100000, decimal -32768) followed by null padding after the last
real sample; modern WFDB consumers ignore the trailing null
padding.
Format 24
Section titled “Format 24”Each sample is a 24-bit signed integer in little-endian two’s complement, three bytes per sample, least significant byte first. The high bit of the third byte is the sign bit. Available in WFDB 10.5.0 (March 2010) and later.
Format 32
Section titled “Format 32”Each sample is a 32-bit signed integer in little-endian two’s complement, four bytes per sample, least significant byte first. Available in WFDB 10.5.0 and later.
Format 61
Section titled “Format 61”Each sample is a 16-bit signed integer in big-endian two’s complement, most significant byte first. Format 61 is otherwise identical to format 16 — only the byte order differs.
Format 80
Section titled “Format 80”Each sample is an 8-bit value in offset binary: the unsigned byte read from disk has to be reduced by 128 to obtain a signed 8-bit amplitude.
sample = byte - 128Sample range is therefore -128..+127.
Format 160
Section titled “Format 160”Each sample is a 16-bit unsigned little-endian value in offset binary: the value has to be reduced by 32768 to obtain a signed 16-bit amplitude.
sample = uint16_le - 32768Sample range is -32768..+32767. Byte order matches format 16: low
byte first.
Format 212
Section titled “Format 212”Each sample is a 12-bit two’s complement value, two samples per three bytes, bit-packed.
For each group of three input bytes b0 b1 b2 (file offsets
3k, 3k+1, 3k+2):
pair = b0 | (b1 << 8) // first byte pair, little-endianlow12 = pair & 0x0FFF // 12 LSBhigh4 = (pair >> 12) & 0x0F // top nibble of the first pair
sample[2k] = sign_extend_12( low12 )sample[2k+1] = sign_extend_12( high4 | (b2 << 4) )i.e. the first sample occupies the 12 low bits of the first byte pair (LSB first within the pair); the second sample occupies the remaining 4 high bits of that pair as its low nibble and the next single byte as its high 8 bits.
Sign extension: bit 11 is the sign bit, replicate it into bits
12..31.
The bit layout, MSB … LSB, for the 24 bits of the triplet b2 b1 b0:
b2[7..0] b1[7..4] b1[3..0] b0[7..0]sample1 high 8 | sample1 low 4 sample0 high 4 | sample0 low 8Most of the signal files in PhysioBank are written in format 212.
Format 310
Section titled “Format 310”Each sample is a 10-bit two’s complement value, three samples per four bytes, bit-packed across two consecutive 16-bit little-endian words. The unused bit of each word is written as zero by the WFDB library.
For each group of four input bytes (two little-endian 16-bit words
w0 and w1):
w0 = b0 | (b1 << 8) // first 16-bit word, LEw1 = b2 | (b3 << 8) // second 16-bit word, LE
sample[3k] = sign_extend_10( (w0 >> 1) & 0x3FF ) // 11 LSB of w0, low bit droppedsample[3k+1] = sign_extend_10( (w1 >> 1) & 0x3FF ) // 11 LSB of w1, low bit droppedsample[3k+2] = sign_extend_10( // 5 MSB of each word concatenated ((w0 >> 11) & 0x1F) // becomes low 5 bits of sample 3k+2 | (((w1 >> 11) & 0x1F) << 5)) // high 5 bits of sample 3k+2The bit-1 position in each word is reserved (set to zero on write, ignored on read).
The bit layout, MSB … LSB, for the 32 bits of the quadruplet b3 b2 b1 b0 viewed as the two words w1 w0:
w1: [ sample2 high 5 ][ sample1 high 6 .... sample1 low 5 ][ 0 ]w0: [ sample2 low 5 ][ sample0 high 6 .... sample0 low 5 ][ 0 ]Format 311
Section titled “Format 311”Each sample is a 10-bit two’s complement value, three samples per four bytes, bit-packed into a single 32-bit little-endian word. The two top bits of the 32-bit word are unused and are written as zero.
For each group of four input bytes:
word = b0 | (b1 << 8) | (b2 << 16) | (b3 << 24) // 32-bit LE
sample[3k] = sign_extend_10( word & 0x3FF ) // bits 0..9sample[3k+1] = sign_extend_10( (word >> 10) & 0x3FF ) // bits 10..19sample[3k+2] = sign_extend_10( (word >> 20) & 0x3FF ) // bits 20..29 // bits 30..31 unused, 0The bit layout, MSB … LSB:
bit 31..30 bit 29..20 bit 19..10 bit 9..0unused sample 3k+2 sample 3k+1 sample 3kSign extension: bit 9 is the sign bit of each 10-bit value;
replicate it into bits 10..31 of the decoded integer.
Differences from format 310 — both encode three 10-bit samples in four bytes:
- Format 310 splits the third sample across two 16-bit LE words and drops one bit in each word.
- Format 311 packs all three samples contiguously inside one 32-bit LE word and uses two top bits as padding instead of two scattered bits.
The two formats are not byte-compatible and a stream cannot be reinterpreted from one to the other without recoding.
Formats 508, 516, and 524 (FLAC)
Section titled “Formats 508, 516, and 524 (FLAC)”Signal data is compressed using the FLAC (Free Lossless Audio Codec) container. The format code’s tens digit names the bits-per-sample:
| Code | Bits per sample |
|---|---|
| 508 | 8 |
| 516 | 16 |
| 524 | 24 |
See the FLAC format reference at https://xiph.org/flac/format.html.
Constraints on the WFDB side:
- The number of WFDB signals in the file must equal the number of channels in the FLAC stream, so at most 8 signals.
- Every signal in the file must share the same sampling frequency and therefore the same samples-per-frame value.
Constraints on the FLAC side:
- The FLAC
bits per samplefield must be8,16, or24. - Every encoded sample must fall in the signed range named by the
bits per samplefield. - The FLAC
sample ratefield should be set to96000regardless of the actual WFDB sampling frequency. The values88200,176400, and192000must not be used because they are rejected by older FLAC decoders.
The FLAC block size is independent of the WFDB frame size: a single FLAC block may contain several WFDB frames, and a single WFDB frame may straddle FLAC blocks.
Multiplexed signal files
Section titled “Multiplexed signal files”A signal file may hold one signal, or several signals interleaved
sample-by-sample. The set of signals in the same file is a signal
group; their signal specification lines must be consecutive in the
header. WFDB applications discover signal groups through the
group field of WFDB_Siginfo.
If all signals in a group share the same sampling frequency and the
group contains n signals, the on-disk layout is sample-major:
frame[k] = s[0][k] s[1][k] s[2][k] ... s[n-1][k]i.e. one sample from each signal in declaration order, then the next
frame, and so on. Successive samples of the same signal are spaced
n samples apart in the file. For oversampled signals see
Multi-frequency records; in that case
the same signal contributes several samples to one frame and the
intra-frame sample order matches the declaration order of those
samples.
Multiplexed files are the default in PhysioBank: CDROM-shipped and HTTP-served signal files are multiplexed whenever the record has more than one signal. Multiplexed layout is useful when storage is sequential-access only (tape), when seek times are high (optical disk), when many signals would exceed the per-process open-file limit, or when high-rate acquisition cannot tolerate per-signal file overhead.
A multiplexed file’s byte size for an n-signal record sampled at a
single rate is
file_bytes = byte_offset + n * samples_per_signal * sample_byteswhere sample_bytes follows from the format (see table above; for
the bit-packed formats compute the bytes from the triplet/pair
size). For format 212 the byte count is
byte_offset + ceil(n * samples_per_signal / 2) * 3 because every
two samples consume three bytes.
If samples_per_signal is zero in the header, the WFDB library
infers the length from the file size by inverting these
relationships and dividing by the channel count.
Multi-frequency records
Section titled “Multi-frequency records”When signals of different bandwidths are recorded together it is
often wasteful to sample them all at the same rate. WFDB 9.0+
supports records where each signal is sampled at an integer multiple
of a common frame rate. The frame rate is the value stored in the
record line’s sampling_frequency field; per-signal sampling
frequencies are derived by multiplying the frame rate by each
signal’s samples_per_frame field (default 1).
In a multi-frequency record:
- A frame contains one or more samples from each signal. Each
signal contributes exactly
samples_per_framesamples per frame in a fixed intra-frame order. - The header’s
sampling_frequencyis the frame rate, the number of frames per second. - The product
samples_per_frame * sampling_frequencyis the per-signal sample rate.
Two read modes are available to applications via setgvmode:
- Low-resolution (default) —
getvecreturns one sample per signal per frame. Oversampled signals are decimated by averaging theirsamples_per_framesamples inside the frame. - High-resolution —
getvecreturns one sample per signal per sample slot of the fastest signal. Slower signals are zero-order-held (the same value is replicated). In this modesampfreqreturns the high-resolution rate and all time-valued arguments to and from the WFDB library are measured in high-resolution sample intervals. WFDB 9.6+ also rewrites time fields of annotations through the same conversion.
The runtime default is selected by the WFDBGVMODE environment
variable (0 = low resolution; any other value = high resolution),
or by the compile-time DEFWFDBGVMODE if the variable is unset.
Digital-to-physical conversion
Section titled “Digital-to-physical conversion”Sample values stored in a signal file are unitless ADC values. The
conversion to physical units is linear and uses the header’s
adc_gain and baseline fields:
physical = (sample - baseline) / adc_gainequivalent to
physical = sample / adc_gain - baseline / adc_gainwith the inverse map used by writers:
sample = round(physical * adc_gain) + baselinebaseline defaults to adc_zero when not stored. adc_gain
defaults to DEFGAIN = 200.0 units per physical unit when zero or
absent; units defaults to mV.
The pair (adc_resolution, adc_zero) describes the converter
hardware and is not used in the formula itself, but it constrains
the range of legal sample values: an R-bit bipolar ADC produces
values in [-2^(R-1), +2^(R-1) - 1], while a unipolar ADC with
adc_zero = M produces values in [M - 2^(R-1), M + 2^(R-1) - 1].
Annotation files
Section titled “Annotation files”A record may carry zero or more annotation files. Each annotation
file has the name record.annotator, where record is the record
name and annotator is an installation-defined suffix (commonly
atr for reviewed reference annotations, qrs for an automatic
detector, etc.). Annotation files are binary; their format is
detected automatically when the file is opened.
The annotation type vocabulary is defined in <wfdb/ecgcodes.h> and
the auxiliary character-to-code mapping in <wfdb/ecgmap.h>.
MIT format
Section titled “MIT format”Variable-length, slightly over two bytes per annotation on average. Annotations are emitted in time order. Each annotation occupies an even number of bytes; the first byte of each pair is the least significant.
Per pair, with byte pair value P = byte[1] << 8 | byte[0]:
- The 6 most significant bits of
Pare the annotation type codeA(A = (P >> 10) & 0x3F). - The 10 least significant bits of
Pare the time deltaI(I = P & 0x3FF), in sample intervals from the previous annotation (or from sample 0 for the first annotation).
When 0 < A <= ACMAX the type code A names an annotation as
defined in <wfdb/ecgcodes.h> and the annotation is real. Five
sentinel values reserve the high range for control records:
A | Name | Meaning |
|---|---|---|
| 59 | SKIP | I = 0; the next four bytes hold a 32-bit time delta as a PDP-11 long: high 16 bits first, then low 16 bits, each pair stored low byte first. Use this to express delta intervals beyond 10 bits. |
| 60 | NUM | I is the num field assigned to this annotation and to every subsequent one until another NUM record appears. Initial num is 0. |
| 61 | SUB | I is the subtype field assigned to this annotation only. Subsequent annotations revert to subtype = 0. |
| 62 | CHN | I is the chan field assigned to this annotation and to every subsequent one until another CHN record appears. Initial chan is 0. |
| 63 | AUX | I is the byte length of an auxiliary blob carried in the next I bytes. If I is odd, an extra 0x00 byte is appended to restore even alignment; the padding byte is not counted in I. |
A = 0, I = 0 is the end-of-file sentinel.
AHA format
Section titled “AHA format”Fixed-width: every annotation occupies exactly 16 bytes.
| Off | Size | Field |
|---|---|---|
| 0 | 1 | reserved, unused |
| 1 | 1 | AHA annotation code (single ASCII character) |
| 2 | 4 | time, PDP-11 long (see below) |
| 6 | 2 | annotation serial number |
| 8 | 8 | auxiliary, content depends on origin |
The PDP-11 long encoding is the same one used by SKIP in the MIT
format: high 16 bits first, then low 16 bits, each pair stored low
byte first.
The auxiliary 8-byte trailer carries different content depending on the writer:
- AHA distribution tapes — the trailing 8 bytes are unused; the time stored in bytes 2..5 is given in milliseconds from the beginning of the annotated segment rather than in sample intervals from the start of the record.
- WFDB-written AHA files — the time stored in bytes 2..5 is
given in sample intervals from the start of the record. Byte 8
carries the MIT subtype, byte 9 carries the MIT type code, and
bytes 10..15 hold up to six ASCII characters used as auxiliary
text for
RHYTHMandNOTEannotations.
AHA-format annotation files may be converted losslessly to MIT format, reducing storage by a factor of eight.
Calibration files
Section titled “Calibration files”Calibration files are not bound to a single record. They describe
the calibration-pulse semantics for each signal type used in a
WFDB installation. Their location is resolved through the WFDBCAL
environment variable using the WFDB path. The format is text,
line-oriented, with CR/LF line endings.
Each entry is a single line:
DESC<TAB>LOW HIGH TYPE SCALE UNITSwith one TAB between the description and the parameters, and spaces between the parameters.
| Field | Content |
|---|---|
DESC | String (may contain spaces, no tabs) matched as a prefix against a signal’s description from the header file. * is the catch-all entry. |
LOW | Physical value of the low-amplitude phase of the calibration pulse. - marks the signal as AC-coupled (the calibration size is then the peak-to-peak amplitude). |
HIGH | Physical value of the high-amplitude phase, or - to mark the calibration size as undefined. |
TYPE | One of sine, square, undefined. |
SCALE | Customary plot scale, in physical units per centimetre. |
UNITS | Physical unit, no embedded whitespace. Must match the signal’s units field exactly (e.g. mV, mmHg, degrees_Celsius). |
For DC-coupled signals LOW must be a real number. For AC-coupled
signals LOW is - and HIGH carries the peak-to-peak amplitude.
The function getcal returns the first entry whose DESC matches
the signal description as either an exact match or a prefix of it,
and whose UNITS matches the header units exactly. More specific
entries should appear before less specific ones (the prefix ECG lead II must come before the prefix ECG if they need different
calibrations).
Comment lines (#), empty lines, and malformed lines are ignored.
Example:
# A simple example of a WFDB calibration file
ECG - 1 sine 1 mVNBP 0 100 square 100 mmHgIBP 0 - square 100 mmHgResp - - undefined 1 linterprets ECG signals as AC-coupled with a 1 mV peak-to-peak sine calibration drawn at 1 mV/cm, non-invasive blood pressure as DC-coupled with a 0..100 mmHg square calibration drawn at 100 mmHg/cm, invasive blood pressure as DC-coupled with an undefined calibration amplitude, and respiration as AC-coupled with an undefined calibration shape.
Calibration entries for derived annotation streams use the
annotator name in place of DESC and the literal text units as
UNITS:
edr - - undefined 200 unitsann - - undefined 100 unitsThe entry tagged ann is the default for annotation streams that
do not have a matching entry by name.
Validation
Section titled “Validation”A conforming reader rejects inputs that violate any of the following:
- A header file is missing or unreadable through the WFDB path.
- A header line exceeds 255 bytes.
- The record line is missing, empty, or comment-only.
- The record name in the record line contains characters outside
[A-Za-z0-9_]. signal_countis missing, negative, or not a parseable integer.sampling_frequencyis present but non-positive, NaN, or infinite.- A trailing optional field appears without the preceding optional
field (e.g.
base_datewithoutbase_time). - The header lacks at least
signal_countsignal specification lines (or, for a multi-segment record,segment_countsegment specification lines). - A signal specification line names a
formatnot listed in Signal formats. - A format modifier appears with whitespace between it and the
format integer (
16 x2instead of16x2). - Two signals share a signal file but disagree on
format,byte_offset, orblock_size. - A signal file referenced by the header cannot be opened on the WFDB path.
- A signal file is shorter than the byte count implied by
signal_count,samples_per_signal,samples_per_frame, andbyte_offsetfor its signal group. adc_gainis zero or absent but the reader is asked to convert to physical units without falling back toDEFGAIN.- For a difference-format signal (format 8) the
initial_valuefield is missing. - The on-disk signal data of a difference-format signal does not reconstruct to a value within the declared ADC range.
- A 12-bit / 10-bit packed format (212 / 310 / 311) has trailing bytes whose packed sample count is less than the declared number of samples.
- For format 310, the bit reserved as zero in each 16-bit word is non-zero. (The WFDB library writes it as zero; a non-zero value signals corruption.)
- For format 311, bits 30..31 of any 32-bit word are non-zero.
- For FLAC formats 508 / 516 / 524, the FLAC
bits per samplefield is not 8 / 16 / 24 respectively; the FLAC channel count differs from the WFDB signal count of the file; or any sample exceeds the range of the declared bit width. - The header declares
record_count’s segment count but a referenced segment record cannot be opened. - A segment specification line declares a sample count different from the one stored in the segment’s own header.
- A multi-segment record nests another multi-segment record.
- A variable-layout record has a non-zero-length layout segment.
- A variable-layout record’s non-zero segment changes the sampling frequency relative to the layout segment.
- A computed signal checksum disagrees with the value stored in the
header (best-effort; readers may treat this as a warning when
samples_per_signalis zero). - An annotation file
SKIPrecord hasI != 0in its first byte pair, or fewer than four bytes follow it. - An annotation file
AUXrecord has fewer thanIpayload bytes (orI + 1whenIis odd). - An annotation file ends without a final
A = 0, I = 0sentinel.
The validation order is at the reader’s discretion.
- The “WFDB format” is plural. A single record consists of a header text file plus one or more binary signal files and zero or more annotation files. Tools that consume “WFDB” must be told a record name, not a single filename.
- Sample ordering within a multi-signal file is channel-multiplexed
(sample-major), unlike EDF, which is channel-major within each
data record. Reading a single channel out of a multi-signal WFDB
file requires striding through the file
signal_countsamples at a time. - The signal file does not carry the sampling frequency, ADC gain,
or any other metadata. Losing the header file leaves the signal
bytes uninterpretable. The decoder must keep the
.heafile paired with the data. - Format 8 is lossy on steep transients because differences are
clamped to
+-127. Round-trip through format 8 is not bit-exact unless the source already respects the slew limit. - The WFDB library reads EDF and EDF+ files transparently as if they
were WFDB records (WFDB 10.4.5+). It does not write them; the
mit2edftool converts a WFDB record to EDF when needed. The WFDB library does not decode EDF+ annotation streams natively;rdedfannextracts them into a conventional annotation file. - BDF and BDF+ (24-bit EDF variants) are also accepted as WFDB records.
record_namematching for collection-level file lookups (RECORDS,ANNOTATORS,DBS) is filesystem-cased on case-sensitive hosts and case-insensitive on the others; the WFDB library does not normalize case.descriptionis the field the calibration system keys on. A description ofECG lead IIwill be matched by a calibration entry ofECG(prefix match) unless a more specific entry exists earlier in the file.