Skip to content

WFDB (WaveForm DataBase)

External format for physiologic time-series records developed at the MIT Laboratory for Computational Physiology and used as the native storage format of PhysioNet / PhysioBank. A record is not a single file; it is a set of files that share a record name and are distinguished by extension. There is no magic byte and no embedded record manifest — the catalog of files is the directory itself.

A WFDB record has up to four file kinds:

  • Header file — text, suffix .hea. Required (except for EDF/BDF records, which the WFDB library accepts as drop-ins). Carries the record-level metadata: sampling frequency, signal count, signal count per signal file, per-signal format and scaling, optional start time/date, optional segment list, and arbitrary trailing info strings.
  • Signal file(s) — binary, conventionally suffix .dat. Carry the samples. One signal file may hold several signals interleaved (channel-multiplexed), and a record may use any number of signal files. Naming and location are dictated by the header file.
  • Annotation file(s) — binary, suffix is the annotator name (e.g. .atr for reviewed reference annotations, .qrs for an automatic QRS detector). Carry per-sample labels.
  • Calibration file — text, not record-bound. One file per installation describes calibration-pulse semantics for signal types (ECG, ABP, Resp, …) and is referenced through the WFDBCAL environment variable.

The reference texts are the PhysioNet WFDB Applications Guide manual pages at https://physionet.org/physiotools/wag/header-5.htm, https://physionet.org/physiotools/wag/signal-5.htm, https://physionet.org/physiotools/wag/annot-5.htm, and https://physionet.org/physiotools/wag/wfdbca-5.htm, with the chapter on file types in the WFDB Programmer’s Guide at https://physionet.org/physiotools/wpg/wpg_39.htm.

WFDB has no magic bytes. Detection is structural:

  • Extension.hea identifies the header. A directory containing one or more .hea files is a WFDB record set. Signal files conventionally use .dat; annotation files use an installation-defined annotator suffix.
  • Record discovery — given a record name R, the WFDB library opens R.hea, reads the signal file names from inside, and locates those files relative to the WFDB path (the WFDB environment variable, an ordered list of directories).
  • Companion records — a directory may host many records side by side. A collection-level RECORDS file may enumerate them, but the WFDB library does not require it; reading is record-name driven.

The record name itself never includes the .hea suffix. A name that contains / denotes a multi-segment record where the integer after the / is the segment count.

The header is a line-oriented ASCII text file. Lines are separated by LF, optionally preceded by CR (WFDB 6.1+ accepts CRLF). No line may exceed 255 bytes including the terminator.

  • Comments — any line whose first printable character is # is a comment. Comments may appear anywhere. Comment lines that come after the last signal specification line are reserved for info strings (see below); other comments are ignored by the WFDB library.
  • Empty lines are ignored.
  • Fields within a non-comment line are separated by spaces or tabs, except for the compound fields described below where a specific delimiter (/, x, :, +, parentheses) is bound to a field with no surrounding whitespace.
  • Numbers are read with the C standard scanf rules for the underlying type, so 360, 360., 360.0, and 3.6e2 are all legal and equivalent for a floating-point field.

The first non-empty, non-comment line is the record line. It describes the record as a whole. Fields, left to right, with [opt] marking optional fields:

#FieldTypeForm / delimiter
1record_namestring[A-Za-z0-9_]+
1asegment_count [opt]uintrecord_name/segment_count
2signal_countuintspace-separated
3sampling_frequency [opt]floatspace-separated
3acounter_frequency [opt]floatsampling_frequency/counter_frequency
3bbase_counter_value [opt]float(value)
4samples_per_signal [opt]uintspace-separated
5base_time [opt]stringHH:MM:SS
6base_date [opt]stringDD/MM/YYYY

Each optional field is admissible only when every previous optional field is also present.

  • record_name — identifier. Allowed characters are ASCII letters, digits, and underscore.
  • segment_count — present only when the record name carries a /n suffix; signals that the file is a multi-segment record and that the lines that follow the record line are segment specification lines rather than signal specification lines. The count must be positive; the value 1 is legal but unusual.
  • signal_count — number of signals in the record. May be zero (used by annotation-only records and by the layout segment of a variable-layout multi-segment record). It is not the number of signal files: several signals may share one file (a signal group).
  • sampling_frequency — in samples per second per signal. If absent, the value defaults to DEFREQ = 250.0. Must be strictly positive.
  • counter_frequency — ticks per second of an external counter (e.g. analog tape counter, chart-recording page index). Used by strtim to convert counter-based time strings (c123) into sample indices. Defaults to sampling_frequency when absent or non-positive.
  • base_counter_value — counter reading that corresponds to sample index 0. Defaults to 0.
  • samples_per_signal — length of the record in samples per signal. If zero or absent, the record length is unspecified and checksum verification is disabled.
  • base_time — wall-clock time at which sample 0 was recorded, 24-hour clock, e.g. 13:05:00 (or 13:5:0). Defaults to 00:00:00.
  • base_date — date at which sample 0 was recorded, day-month-year, e.g. 25/4/1989 for 25 April 1989.

In a single-segment record (one without a segment_count), every non-empty, non-comment line after the record line is a signal specification line. The lines appear in signal order, starting from signal 0. The header must contain at least signal_count of these; extra trailing lines are not read.

Each line, left to right:

#FieldTypeForm / delimiter
1signal_file_namestringabsolute / relative path, or - (stdin/stdout)
2formatuintspace-separated; one of the codes in Signal formats
2asamples_per_frame [opt]uintformatxN (bound to format, prefix x)
2bskew [opt]uintformat:S (bound to format, prefix :)
2cbyte_offset [opt]uintformat+B (bound to format, prefix +)
3adc_gain [opt]floatspace-separated
3abaseline [opt]intadc_gain(baseline)
3bunits [opt]stringadc_gain/units (no whitespace)
4adc_resolution [opt]uintspace-separated, bits
5adc_zero [opt]intspace-separated
6initial_value [opt]intspace-separated
7checksum [opt]intspace-separated, signed 16-bit
8block_size [opt]intspace-separated, bytes
9description [opt]stringrest of line; may contain spaces

The format modifiers x, :, + are bound to the format field — no whitespace is allowed between the format integer and the modifier. Multiple modifiers are concatenated in any order, each with its own prefix character (16x2:30+1024).

Field-by-field detail follows.

Path to the binary signal file holding this signal’s samples. The WFDB library resolves the name against the WFDB path: a leading empty path component makes absolute paths usable as-is, and a directory not yet in WFDB is appended to it (WFDB 6.2+).

  • The special name - means standard input (when reading) or standard output (when writing).
  • The special name ~ is reserved for the layout segment of a variable-layout multi-segment record and never refers to a real file.
  • Multiple signals may share one file; their signal specification lines must then be consecutive (and form a signal group).
  • The byte sum of signal_file_name and description is capped at 80 characters.

Integer code that selects the encoding of samples on disk. The codes defined by the WFDB library are listed in Signal formats; the most common in PhysioBank are format 212 (12-bit pair-packed, MIT-BIH legacy), format 16 (plain 16-bit two’s complement), and format 8 (8-bit first differences). Within one signal group every signal must use the same format.

The samples_per_frame, skew, and byte_offset modifiers attach to this field directly with no separating whitespace, each marked by its own prefix character.

Bound to format with the prefix character x. Default value is 1. A value N > 1 declares the signal as oversampled: it contributes N samples to every record frame (see Multi-frequency records) and was digitized at N * sampling_frequency samples per second. Non-integer multipliers are not supported. WFDB versions <= 8.3 ignore this field and cannot read oversampled records correctly.

Bound to format with the prefix character :. Default value is 0. A positive integer S declares that this signal’s stream precedes sample 0 by S samples relative to the rest of the record (because of azimuth mismatch in a multi-track analog tape, calibration offset, etc.). Those leading S samples are included in the signal’s checksum but are not returned by getvec / getframe. Editing the skew alone never changes the checksum. WFDB versions <= 9.1 ignore this field.

Bound to format with the prefix character +. Default value is 0. A signal file containing a preamble (header bytes prepended by a non-WFDB writer) sets byte_offset to the byte length of that preamble; sample 0 begins at that offset. All signals in the same signal group must declare the same offset. The preamble is excluded from the checksum. The WFDB library only reads such files; it never writes them. WFDB versions <= 4.4 ignore byte offsets and return preamble data as if it were samples.

Floating-point number expressing the ADC’s slope as ADC units per physical unit. A step of one physical unit at the analog input produces an output that differs by adc_gain ADC units. For an ECG this is roughly the R-wave amplitude in a lead aligned with the cardiac axis. Zero or absent means the signal is uncalibrated; the library substitutes DEFGAIN = 200.0 ADC units per physical unit.

Bound to adc_gain with parentheses. Integer that names the ADC sample value corresponding to a physical reading of 0. Defaults to adc_zero. The baseline does not have to fall inside the ADC range — a temperature sensor mapped to 200..300 K can place its baseline well below digital_min because 0 K lies outside the represented range. WFDB versions <= 5.0 ignore this field.

Bound to adc_gain (after the optional baseline) with the prefix character /. String without embedded whitespace naming the physical unit of the signal: mV, mmHg, degC, l, … Defaults to mV when absent. WFDB versions <= 4.7 ignore this field.

Bits of resolution of the analog-to-digital converter, typically 8..16. If absent or zero, the default is 12 for amplitude formats and 10 for difference formats (format 8). Some formats imply a lower resolution and override the default (e.g. 12 for format 212, 10 for formats 310 and 311).

Sample value that the ADC would produce for an input exactly in the middle of its range. For a bipolar converter this is 0; for a unipolar (offset-binary) converter it is the midpoint, e.g. 1024 for an 11-bit unipolar ADC. Together with adc_resolution this fixes the range of legal sample values. Defaults to 0.

Sample value at index 0. Used only by difference-coded formats (format 8) to seed the cumulative sum. Defaults to adc_zero.

Signed 16-bit checksum of the reconstructed sample stream of this signal. The checksum is computed on the decoded samples, not on the on-disk bytes, so it does not change when a signal is reformatted. It is verified only when the full record is read from start to end and samples_per_signal is known. 0 is also used as a placeholder when samples_per_signal is unspecified.

Block size in bytes for reading the signal file. Almost always 0. Non-zero values are reserved for character special files (raw tape or disk devices) where I/O must happen in fixed-size blocks. A negative value flags the file as not seekable by fseek. All signals in a signal group share the same block size.

Free-form text identifying the signal: lead name, sensor, body site, etc. May include embedded spaces; runs to the end of the line. Whitespace separating it from block_size is not part of the description. When the description is missing, the library synthesizes "record R, signal n". Conventional ECG descriptions include MLII, V1V6, ABP, Resp. The combined byte length of signal_file_name and description cannot exceed 80 characters.

Comment lines that follow the last signal specification line are not discarded. They are exposed through getinfo and putinfo as info strings: each line’s content after the leading # is one info string. No whitespace may precede the # of an info-string line. The convention used in PhysioBank for subject metadata is

# <age>: 35 <sex>: M <diagnoses>: (none) <medications>: (none)

Info strings are not defined for the top-level header of a multi-segment record.

A multi-segment record concatenates several ordinary records along the time axis. It is identified by the /N suffix on the record name in the record line, where N is the segment count. After the record line the header carries N segment specification lines instead of signal specification lines:

#FieldType
1record_namestring
2samples_per_signaluint

The segment record name must denote an ordinary (single-segment) record sitting next to the top-level header in the same directory or on the WFDB path. Each segment must declare its sample count in its own header.

Two flavours:

  • Fixed-layout — all segments share the same signal arrangement, gain, baseline, units, ADC resolution and zero, and description. Storage formats may still differ from segment to segment, allowing per-segment compression choices.
  • Variable-layout — relaxed constraints. The first segment (segment 0) is a layout segment: an ordinary record with a length of 0 samples whose only role is to declare the desired final arrangement of signals, gains, and baselines. A layout segment has no signal file; its signal specification lines use ~ as the file name. When read with WFDB 10.3.17 or later, the library scales, shifts, reorders, and zero-pads each subsequent segment to match the layout segment.

Segments may not nest. A segment specification line whose record name is ~ denotes a null segment; reading such a segment yields the sentinel value WFDB_INVALID_SAMPLE for every position and no signal or header files are opened.

MIT-BIH record 100 (two interleaved ECG signals in format 212, one signal file, 30 minutes at 360 Hz):

100 2 360 650000 0:0:0 0/0/0
100.dat 212 200 11 1024 995 -22131 0 MLII
100.dat 212 200 11 1024 1011 20052 0 V5
# 69 M 1085 1629 x1
# Aldomet, Inderal

AHA DB record 7001 (two ECG signals in format 8, each in its own absolute-path signal file, 250 Hz, 10-bit ADC):

7001 2 250 525000
/db1/data0/d0.7001 8 100 10 0 -53 -1279 0 ECG signal 0
/db1/data1/d1.7001 8 100 10 0 -69 15626 0 ECG signal 1

Local record 8l (16 signals in format 8, file names looked up via the WFDB path):

8l 16
data0 8
data1 8
data2 8
...
data15 8

Piped record 16x4 (four 16-bit signals streamed through standard I/O):

# Piped record 16x4. Use this record to read or write 4 signals
# using the standard I/O.
16x4 4
- 16
- 16
- 16
- 16

ahatape (two 16-bit signals streamed from a raw 9-track tape with 4096-byte blocks):

# Use this record on a UNIX system to read directly
# from a 9-track AHA DB distribution tape with
# 4096-byte blocks. The tape must be positioned
# to the beginning of the ECG data file before
# using this record.
ahatape 2 250
/dev/nrmt0 16 0 12 0 0 0 4096
/dev/nrmt0 16 0 12 0 0 0 4096

Multi-segment record multi (three segments, mixed formats):

multi/3 2 360 45000
100s 21600
null 1800
100s 21600

The total length 45000 equals the sum of the segment lengths (21600 + 1800 + 21600). The middle segment null is itself an ordinary record built from format-0 (null) signals.

The format field of a signal specification line selects how samples are laid out in the signal file. All multi-byte fixed-width integer formats use signed two’s complement. Endianness varies by format and is called out below. All formats can be used in multiplexed signal files, where samples from the file’s signal group are interleaved sample by sample (see Multiplexed signal files).

CodeWidthEndianCodingNotes
0nullplaceholder, all samples decoded as zero
88signed first diff1 byte per sample, requires initial_value
1616LEtwo’s complementmost common amplitude format
2424LEtwo’s complementWFDB 10.5.0+
3232LEtwo’s complementWFDB 10.5.0+
6116BEtwo’s complement”big-endian 16”
808offset binarysample minus 128 is the signed value
16016LEoffset binarysample minus 32768 is the signed value
21212LEtwo’s complementtwo samples per 3 bytes, bit-packed
31010LEtwo’s complementthree samples per two 16-bit words
31110LEtwo’s complementthree samples per one 32-bit word
5088FLAC compressedup to 8 channels, lossless
51616FLAC compressedup to 8 channels, lossless
52424FLAC compressedup to 8 channels, lossless

WFDB does not store its own per-sample bit width separately from the format code — the format implies the width. The adc_resolution field describes the converter, not the on-disk bit width, and is used only for digital-to-physical scaling and range reporting.

No on-disk storage. Every sample reads as zero. Used as a filler in multi-segment records and for placeholder signals.

Each sample is an 8-bit signed first difference, one byte per sample. The reconstructed sample value at index n is

x[n] = initial_value + sum(b[0..n])

where b[i] is the signed byte read at file position i (relative to byte_offset).

When a writer cannot encode a difference in 8 bits (the slew rate would exceed +-127 LSB per sample), it emits the largest legal difference of the right sign (-128 or +127) and continues adjusting the next bytes so that the running sum reaches the true sample as fast as possible. Encoding through format 8 is therefore lossy if the source has steep transients.

In a multiplexed format-8 file, the first difference is taken between two consecutive samples of the same signal, not between adjacent bytes in the file. Otherwise two interleaved channels whose baselines differ by more than 128 ADC units could not be represented at all.

initial_value is mandatory: the reader needs the seed sample to unroll the cumulative sum.

Each sample is a 16-bit signed integer in little-endian two’s complement, least significant byte first. The most common amplitude format. Historically the format used for MIT-BIH and AHA database distribution on 9-track tapes also added a logical EOF marker (octal 0100000, decimal -32768) followed by null padding after the last real sample; modern WFDB consumers ignore the trailing null padding.

Each sample is a 24-bit signed integer in little-endian two’s complement, three bytes per sample, least significant byte first. The high bit of the third byte is the sign bit. Available in WFDB 10.5.0 (March 2010) and later.

Each sample is a 32-bit signed integer in little-endian two’s complement, four bytes per sample, least significant byte first. Available in WFDB 10.5.0 and later.

Each sample is a 16-bit signed integer in big-endian two’s complement, most significant byte first. Format 61 is otherwise identical to format 16 — only the byte order differs.

Each sample is an 8-bit value in offset binary: the unsigned byte read from disk has to be reduced by 128 to obtain a signed 8-bit amplitude.

sample = byte - 128

Sample range is therefore -128..+127.

Each sample is a 16-bit unsigned little-endian value in offset binary: the value has to be reduced by 32768 to obtain a signed 16-bit amplitude.

sample = uint16_le - 32768

Sample range is -32768..+32767. Byte order matches format 16: low byte first.

Each sample is a 12-bit two’s complement value, two samples per three bytes, bit-packed.

For each group of three input bytes b0 b1 b2 (file offsets 3k, 3k+1, 3k+2):

pair = b0 | (b1 << 8) // first byte pair, little-endian
low12 = pair & 0x0FFF // 12 LSB
high4 = (pair >> 12) & 0x0F // top nibble of the first pair
sample[2k] = sign_extend_12( low12 )
sample[2k+1] = sign_extend_12( high4 | (b2 << 4) )

i.e. the first sample occupies the 12 low bits of the first byte pair (LSB first within the pair); the second sample occupies the remaining 4 high bits of that pair as its low nibble and the next single byte as its high 8 bits.

Sign extension: bit 11 is the sign bit, replicate it into bits 12..31.

The bit layout, MSB … LSB, for the 24 bits of the triplet b2 b1 b0:

b2[7..0] b1[7..4] b1[3..0] b0[7..0]
sample1 high 8 | sample1 low 4 sample0 high 4 | sample0 low 8

Most of the signal files in PhysioBank are written in format 212.

Each sample is a 10-bit two’s complement value, three samples per four bytes, bit-packed across two consecutive 16-bit little-endian words. The unused bit of each word is written as zero by the WFDB library.

For each group of four input bytes (two little-endian 16-bit words w0 and w1):

w0 = b0 | (b1 << 8) // first 16-bit word, LE
w1 = b2 | (b3 << 8) // second 16-bit word, LE
sample[3k] = sign_extend_10( (w0 >> 1) & 0x3FF ) // 11 LSB of w0, low bit dropped
sample[3k+1] = sign_extend_10( (w1 >> 1) & 0x3FF ) // 11 LSB of w1, low bit dropped
sample[3k+2] = sign_extend_10( // 5 MSB of each word concatenated
((w0 >> 11) & 0x1F) // becomes low 5 bits of sample 3k+2
| (((w1 >> 11) & 0x1F) << 5)) // high 5 bits of sample 3k+2

The bit-1 position in each word is reserved (set to zero on write, ignored on read).

The bit layout, MSB … LSB, for the 32 bits of the quadruplet b3 b2 b1 b0 viewed as the two words w1 w0:

w1: [ sample2 high 5 ][ sample1 high 6 .... sample1 low 5 ][ 0 ]
w0: [ sample2 low 5 ][ sample0 high 6 .... sample0 low 5 ][ 0 ]

Each sample is a 10-bit two’s complement value, three samples per four bytes, bit-packed into a single 32-bit little-endian word. The two top bits of the 32-bit word are unused and are written as zero.

For each group of four input bytes:

word = b0 | (b1 << 8) | (b2 << 16) | (b3 << 24) // 32-bit LE
sample[3k] = sign_extend_10( word & 0x3FF ) // bits 0..9
sample[3k+1] = sign_extend_10( (word >> 10) & 0x3FF ) // bits 10..19
sample[3k+2] = sign_extend_10( (word >> 20) & 0x3FF ) // bits 20..29
// bits 30..31 unused, 0

The bit layout, MSB … LSB:

bit 31..30 bit 29..20 bit 19..10 bit 9..0
unused sample 3k+2 sample 3k+1 sample 3k

Sign extension: bit 9 is the sign bit of each 10-bit value; replicate it into bits 10..31 of the decoded integer.

Differences from format 310 — both encode three 10-bit samples in four bytes:

  • Format 310 splits the third sample across two 16-bit LE words and drops one bit in each word.
  • Format 311 packs all three samples contiguously inside one 32-bit LE word and uses two top bits as padding instead of two scattered bits.

The two formats are not byte-compatible and a stream cannot be reinterpreted from one to the other without recoding.

Signal data is compressed using the FLAC (Free Lossless Audio Codec) container. The format code’s tens digit names the bits-per-sample:

CodeBits per sample
5088
51616
52424

See the FLAC format reference at https://xiph.org/flac/format.html.

Constraints on the WFDB side:

  • The number of WFDB signals in the file must equal the number of channels in the FLAC stream, so at most 8 signals.
  • Every signal in the file must share the same sampling frequency and therefore the same samples-per-frame value.

Constraints on the FLAC side:

  • The FLAC bits per sample field must be 8, 16, or 24.
  • Every encoded sample must fall in the signed range named by the bits per sample field.
  • The FLAC sample rate field should be set to 96000 regardless of the actual WFDB sampling frequency. The values 88200, 176400, and 192000 must not be used because they are rejected by older FLAC decoders.

The FLAC block size is independent of the WFDB frame size: a single FLAC block may contain several WFDB frames, and a single WFDB frame may straddle FLAC blocks.

A signal file may hold one signal, or several signals interleaved sample-by-sample. The set of signals in the same file is a signal group; their signal specification lines must be consecutive in the header. WFDB applications discover signal groups through the group field of WFDB_Siginfo.

If all signals in a group share the same sampling frequency and the group contains n signals, the on-disk layout is sample-major:

frame[k] = s[0][k] s[1][k] s[2][k] ... s[n-1][k]

i.e. one sample from each signal in declaration order, then the next frame, and so on. Successive samples of the same signal are spaced n samples apart in the file. For oversampled signals see Multi-frequency records; in that case the same signal contributes several samples to one frame and the intra-frame sample order matches the declaration order of those samples.

Multiplexed files are the default in PhysioBank: CDROM-shipped and HTTP-served signal files are multiplexed whenever the record has more than one signal. Multiplexed layout is useful when storage is sequential-access only (tape), when seek times are high (optical disk), when many signals would exceed the per-process open-file limit, or when high-rate acquisition cannot tolerate per-signal file overhead.

A multiplexed file’s byte size for an n-signal record sampled at a single rate is

file_bytes = byte_offset + n * samples_per_signal * sample_bytes

where sample_bytes follows from the format (see table above; for the bit-packed formats compute the bytes from the triplet/pair size). For format 212 the byte count is byte_offset + ceil(n * samples_per_signal / 2) * 3 because every two samples consume three bytes.

If samples_per_signal is zero in the header, the WFDB library infers the length from the file size by inverting these relationships and dividing by the channel count.

When signals of different bandwidths are recorded together it is often wasteful to sample them all at the same rate. WFDB 9.0+ supports records where each signal is sampled at an integer multiple of a common frame rate. The frame rate is the value stored in the record line’s sampling_frequency field; per-signal sampling frequencies are derived by multiplying the frame rate by each signal’s samples_per_frame field (default 1).

In a multi-frequency record:

  • A frame contains one or more samples from each signal. Each signal contributes exactly samples_per_frame samples per frame in a fixed intra-frame order.
  • The header’s sampling_frequency is the frame rate, the number of frames per second.
  • The product samples_per_frame * sampling_frequency is the per-signal sample rate.

Two read modes are available to applications via setgvmode:

  • Low-resolution (default) — getvec returns one sample per signal per frame. Oversampled signals are decimated by averaging their samples_per_frame samples inside the frame.
  • High-resolutiongetvec returns one sample per signal per sample slot of the fastest signal. Slower signals are zero-order-held (the same value is replicated). In this mode sampfreq returns the high-resolution rate and all time-valued arguments to and from the WFDB library are measured in high-resolution sample intervals. WFDB 9.6+ also rewrites time fields of annotations through the same conversion.

The runtime default is selected by the WFDBGVMODE environment variable (0 = low resolution; any other value = high resolution), or by the compile-time DEFWFDBGVMODE if the variable is unset.

Sample values stored in a signal file are unitless ADC values. The conversion to physical units is linear and uses the header’s adc_gain and baseline fields:

physical = (sample - baseline) / adc_gain

equivalent to

physical = sample / adc_gain - baseline / adc_gain

with the inverse map used by writers:

sample = round(physical * adc_gain) + baseline

baseline defaults to adc_zero when not stored. adc_gain defaults to DEFGAIN = 200.0 units per physical unit when zero or absent; units defaults to mV.

The pair (adc_resolution, adc_zero) describes the converter hardware and is not used in the formula itself, but it constrains the range of legal sample values: an R-bit bipolar ADC produces values in [-2^(R-1), +2^(R-1) - 1], while a unipolar ADC with adc_zero = M produces values in [M - 2^(R-1), M + 2^(R-1) - 1].

A record may carry zero or more annotation files. Each annotation file has the name record.annotator, where record is the record name and annotator is an installation-defined suffix (commonly atr for reviewed reference annotations, qrs for an automatic detector, etc.). Annotation files are binary; their format is detected automatically when the file is opened.

The annotation type vocabulary is defined in <wfdb/ecgcodes.h> and the auxiliary character-to-code mapping in <wfdb/ecgmap.h>.

Variable-length, slightly over two bytes per annotation on average. Annotations are emitted in time order. Each annotation occupies an even number of bytes; the first byte of each pair is the least significant.

Per pair, with byte pair value P = byte[1] << 8 | byte[0]:

  • The 6 most significant bits of P are the annotation type code A (A = (P >> 10) & 0x3F).
  • The 10 least significant bits of P are the time delta I (I = P & 0x3FF), in sample intervals from the previous annotation (or from sample 0 for the first annotation).

When 0 < A <= ACMAX the type code A names an annotation as defined in <wfdb/ecgcodes.h> and the annotation is real. Five sentinel values reserve the high range for control records:

ANameMeaning
59SKIPI = 0; the next four bytes hold a 32-bit time delta as a PDP-11 long: high 16 bits first, then low 16 bits, each pair stored low byte first. Use this to express delta intervals beyond 10 bits.
60NUMI is the num field assigned to this annotation and to every subsequent one until another NUM record appears. Initial num is 0.
61SUBI is the subtype field assigned to this annotation only. Subsequent annotations revert to subtype = 0.
62CHNI is the chan field assigned to this annotation and to every subsequent one until another CHN record appears. Initial chan is 0.
63AUXI is the byte length of an auxiliary blob carried in the next I bytes. If I is odd, an extra 0x00 byte is appended to restore even alignment; the padding byte is not counted in I.

A = 0, I = 0 is the end-of-file sentinel.

Fixed-width: every annotation occupies exactly 16 bytes.

OffSizeField
01reserved, unused
11AHA annotation code (single ASCII character)
24time, PDP-11 long (see below)
62annotation serial number
88auxiliary, content depends on origin

The PDP-11 long encoding is the same one used by SKIP in the MIT format: high 16 bits first, then low 16 bits, each pair stored low byte first.

The auxiliary 8-byte trailer carries different content depending on the writer:

  • AHA distribution tapes — the trailing 8 bytes are unused; the time stored in bytes 2..5 is given in milliseconds from the beginning of the annotated segment rather than in sample intervals from the start of the record.
  • WFDB-written AHA files — the time stored in bytes 2..5 is given in sample intervals from the start of the record. Byte 8 carries the MIT subtype, byte 9 carries the MIT type code, and bytes 10..15 hold up to six ASCII characters used as auxiliary text for RHYTHM and NOTE annotations.

AHA-format annotation files may be converted losslessly to MIT format, reducing storage by a factor of eight.

Calibration files are not bound to a single record. They describe the calibration-pulse semantics for each signal type used in a WFDB installation. Their location is resolved through the WFDBCAL environment variable using the WFDB path. The format is text, line-oriented, with CR/LF line endings.

Each entry is a single line:

DESC<TAB>LOW HIGH TYPE SCALE UNITS

with one TAB between the description and the parameters, and spaces between the parameters.

FieldContent
DESCString (may contain spaces, no tabs) matched as a prefix against a signal’s description from the header file. * is the catch-all entry.
LOWPhysical value of the low-amplitude phase of the calibration pulse. - marks the signal as AC-coupled (the calibration size is then the peak-to-peak amplitude).
HIGHPhysical value of the high-amplitude phase, or - to mark the calibration size as undefined.
TYPEOne of sine, square, undefined.
SCALECustomary plot scale, in physical units per centimetre.
UNITSPhysical unit, no embedded whitespace. Must match the signal’s units field exactly (e.g. mV, mmHg, degrees_Celsius).

For DC-coupled signals LOW must be a real number. For AC-coupled signals LOW is - and HIGH carries the peak-to-peak amplitude. The function getcal returns the first entry whose DESC matches the signal description as either an exact match or a prefix of it, and whose UNITS matches the header units exactly. More specific entries should appear before less specific ones (the prefix ECG lead II must come before the prefix ECG if they need different calibrations).

Comment lines (#), empty lines, and malformed lines are ignored.

Example:

# A simple example of a WFDB calibration file
ECG - 1 sine 1 mV
NBP 0 100 square 100 mmHg
IBP 0 - square 100 mmHg
Resp - - undefined 1 l

interprets ECG signals as AC-coupled with a 1 mV peak-to-peak sine calibration drawn at 1 mV/cm, non-invasive blood pressure as DC-coupled with a 0..100 mmHg square calibration drawn at 100 mmHg/cm, invasive blood pressure as DC-coupled with an undefined calibration amplitude, and respiration as AC-coupled with an undefined calibration shape.

Calibration entries for derived annotation streams use the annotator name in place of DESC and the literal text units as UNITS:

edr - - undefined 200 units
ann - - undefined 100 units

The entry tagged ann is the default for annotation streams that do not have a matching entry by name.

A conforming reader rejects inputs that violate any of the following:

  • A header file is missing or unreadable through the WFDB path.
  • A header line exceeds 255 bytes.
  • The record line is missing, empty, or comment-only.
  • The record name in the record line contains characters outside [A-Za-z0-9_].
  • signal_count is missing, negative, or not a parseable integer.
  • sampling_frequency is present but non-positive, NaN, or infinite.
  • A trailing optional field appears without the preceding optional field (e.g. base_date without base_time).
  • The header lacks at least signal_count signal specification lines (or, for a multi-segment record, segment_count segment specification lines).
  • A signal specification line names a format not listed in Signal formats.
  • A format modifier appears with whitespace between it and the format integer (16 x2 instead of 16x2).
  • Two signals share a signal file but disagree on format, byte_offset, or block_size.
  • A signal file referenced by the header cannot be opened on the WFDB path.
  • A signal file is shorter than the byte count implied by signal_count, samples_per_signal, samples_per_frame, and byte_offset for its signal group.
  • adc_gain is zero or absent but the reader is asked to convert to physical units without falling back to DEFGAIN.
  • For a difference-format signal (format 8) the initial_value field is missing.
  • The on-disk signal data of a difference-format signal does not reconstruct to a value within the declared ADC range.
  • A 12-bit / 10-bit packed format (212 / 310 / 311) has trailing bytes whose packed sample count is less than the declared number of samples.
  • For format 310, the bit reserved as zero in each 16-bit word is non-zero. (The WFDB library writes it as zero; a non-zero value signals corruption.)
  • For format 311, bits 30..31 of any 32-bit word are non-zero.
  • For FLAC formats 508 / 516 / 524, the FLAC bits per sample field is not 8 / 16 / 24 respectively; the FLAC channel count differs from the WFDB signal count of the file; or any sample exceeds the range of the declared bit width.
  • The header declares record_count’s segment count but a referenced segment record cannot be opened.
  • A segment specification line declares a sample count different from the one stored in the segment’s own header.
  • A multi-segment record nests another multi-segment record.
  • A variable-layout record has a non-zero-length layout segment.
  • A variable-layout record’s non-zero segment changes the sampling frequency relative to the layout segment.
  • A computed signal checksum disagrees with the value stored in the header (best-effort; readers may treat this as a warning when samples_per_signal is zero).
  • An annotation file SKIP record has I != 0 in its first byte pair, or fewer than four bytes follow it.
  • An annotation file AUX record has fewer than I payload bytes (or I + 1 when I is odd).
  • An annotation file ends without a final A = 0, I = 0 sentinel.

The validation order is at the reader’s discretion.

  • The “WFDB format” is plural. A single record consists of a header text file plus one or more binary signal files and zero or more annotation files. Tools that consume “WFDB” must be told a record name, not a single filename.
  • Sample ordering within a multi-signal file is channel-multiplexed (sample-major), unlike EDF, which is channel-major within each data record. Reading a single channel out of a multi-signal WFDB file requires striding through the file signal_count samples at a time.
  • The signal file does not carry the sampling frequency, ADC gain, or any other metadata. Losing the header file leaves the signal bytes uninterpretable. The decoder must keep the .hea file paired with the data.
  • Format 8 is lossy on steep transients because differences are clamped to +-127. Round-trip through format 8 is not bit-exact unless the source already respects the slew limit.
  • The WFDB library reads EDF and EDF+ files transparently as if they were WFDB records (WFDB 10.4.5+). It does not write them; the mit2edf tool converts a WFDB record to EDF when needed. The WFDB library does not decode EDF+ annotation streams natively; rdedfann extracts them into a conventional annotation file.
  • BDF and BDF+ (24-bit EDF variants) are also accepted as WFDB records.
  • record_name matching for collection-level file lookups (RECORDS, ANNOTATORS, DBS) is filesystem-cased on case-sensitive hosts and case-insensitive on the others; the WFDB library does not normalize case.
  • description is the field the calibration system keys on. A description of ECG lead II will be matched by a calibration entry of ECG (prefix match) unless a more specific entry exists earlier in the file.