Internet-Draft | Matroska Format | October 2023 |
Lhomme, et al. | Expires 24 April 2024 | [Page] |
This document defines the Matroska audiovisual data container structure, including definitions of its structural elements, as well as its terminology, vocabulary, and application.¶
This document updates [RFC8794] to permit the use of a previously reserved EBML Element ID.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 24 April 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Matroska is an audiovisual data container format. It was derived from a project called [MCF], but diverges from it significantly because it is based on EBML (Extensible Binary Meta Language) [RFC8794], a binary derivative of XML. EBML provides significant advantages in terms of future format extensibility, without breaking file support in parsers reading the previous versions.¶
First, it is essential to clarify exactly "What an Audio/Video container is", to avoid any misunderstandings:¶
Matroska is designed with the future in mind. It incorporates features such as:¶
This document covers Matroska versions 1, 2, 3 and 4. Matroska v4 is the current version. Matroska 1 to 3 are no longer maintained. No new elements are expected in files with version numbers 1, 2, or 3.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document defines specific terms in order to define the format and application of Matroska
.
Specific terms are defined below:¶
Matroska
:Matroska Reader
:Matroska
.¶
Matroska Player
:Matroska Reader
with a primary purpose of playing audiovisual files, including Matroska
documents.¶
Matroska Writer
:Matroska
documents.¶
Matroska is a Document Type of EBML (Extensible Binary Meta Language). This specification is dependent on the EBML Specification [RFC8794]. For an understanding of Matroska's EBML Schema, see in particular the sections of the EBML Specification covering EBML Element Types (Section 7), EBML Schema (Section 11.1), and EBML Structure (Section 3).¶
Because of an oversight, [RFC8794] reserved EBML ID 0x80, which is used by deployed Matroska implementations. For this reason, this specification updates [RFC8794] to make 0x80 a legal EBML ID. Specifically, the following are changed in [RFC8794]:¶
In Section 17.1,¶
OLD:¶
One-octet Element IDs MUST be between 0x81 and 0xFE. These items are valuable because they are short, and they need to be used for commonly repeated elements. Element IDs are to be allocated within this range according to the "RFC Required" policy [RFC8126].¶
The following one-octet Element IDs are RESERVED: 0xFF and 0x80.¶
NEW:¶
One-octet Element IDs MUST be between 0x80 and 0xFE. These items are valuable because they are short, and they need to be used for commonly repeated elements. Element IDs are to be allocated within this range according to the "RFC Required" policy [RFC8126].¶
The following one-octet Element ID is RESERVED: 0xFF.¶
In Section 5,¶
OLD:¶
+=========================+================+=================+ | Element ID Octet Length | Range of Valid | Number of Valid | | | Element IDs | Element IDs | +=========================+================+=================+ | 1 | 0x81 - 0xFE | 126 | +-------------------------+----------------+-----------------+¶
NEW:¶
+=========================+================+=================+ | Element ID Octet Length | Range of Valid | Number of Valid | | | Element IDs | Element IDs | +=========================+================+=================+ | 1 | 0x80 - 0xFE | 127 | +-------------------------+----------------+-----------------+¶
As an EBML Document Type, Matroska adds the following constraints to the EBML specification.¶
The Root Element and all Top-Levels Elements MUST use 4 octets for their EBML Element ID -- i.e. Segment and direct children of Segment.¶
Legacy EBML/Matroska parsers did not handle Empty Elements properly, elements present in the file but with a length of zero. They always assumed the value was 0 for integers/dates or 0x0p+0, the textual expression of floats using the [ISO9899] format, no matter the default value of the element which should have been used instead. Therefore, Matroska writers MUST NOT use EBML Empty Elements, if the element has a default value that is not 0 for integers/dates and 0x0p+0 for floats.¶
When adding new elements to Matroska, these rules apply:¶
A Matroska file MUST be composed of at least one EBML Document
using the Matroska Document Type
.
Each EBML Document
MUST start with an EBML Header
and MUST be followed by the EBML Root Element
,
defined as Segment
in Matroska. Matroska defines several Top-Level Elements
which may occur within the Segment
.¶
As an example, a simple Matroska file consisting of a single EBML Document
could be represented like this:¶
A more complex Matroska file consisting of an EBML Stream
(consisting of two EBML Documents
) could be represented like this:¶
The following diagram represents a simple Matroska file, comprised of an EBML Document
with an EBML Header
, a Segment Element
(the Root Element
), and all eight Matroska
Top-Level Elements
. In the following diagrams of this section, horizontal spacing expresses
a parent-child relationship between Matroska Elements (e.g., the Info Element
is contained within
the Segment Element
) whereas vertical alignment represents the storage order within the file.¶
The Matroska EBML Schema
defines eight Top-Level Elements
:¶
SeekHead
(Section 6.3),¶
Info
(Section 6.5),¶
Tracks
(Section 18),¶
Chapters
(Section 20),¶
Cluster
(Section 10),¶
Cues
(Section 22),¶
Attachments
(Section 21),¶
Tags
(Section 6.8).¶
The SeekHead Element
(also known as MetaSeek
) contains an index of Top-Level Elements
locations within the Segment
. Use of the SeekHead Element
is RECOMMENDED. Without a SeekHead Element
,
a Matroska parser would have to search the entire file to find all of the other Top-Level Elements
.
This is due to Matroska's flexible ordering requirements; for instance, it is acceptable for
the Chapters Element
to be stored after the Cluster Elements
.¶
The Info Element
contains vital information for identifying the whole Segment
.
This includes the title for the Segment
, a randomly generated unique identifier,
and the unique identifier(s) of any linked Segment Elements
.¶
The Tracks Element
defines the technical details for each track and can store the name,
number, unique identifier, language, and type (audio, video, subtitles, etc.) of each track.
For example, the Tracks Element
MAY store information about the resolution of a video track
or sample rate of an audio track.¶
The Tracks Element
MUST identify all the data needed by the codec to decode the data of the
specified track. However, the data required is contingent on the codec used for the track.
For example, a Track Element
for uncompressed audio only requires the audio bit rate to be present.
A codec such as AC-3 would require that the CodecID Element
be present for all tracks,
as it is the primary way to identify which codec to use to decode the track.¶
The Chapters Element
lists all of the chapters. Chapters are a way to set predefined
points to jump to in video or audio.¶
Cluster Elements
contain the content for each track, e.g., video frames. A Matroska file
SHOULD contain at least one Cluster Element
.
In the rare case it doesn't, there should be a form of Segment linking with other Segments, possibly using Chapters, see Section 17.¶
The Cluster Element
helps to break up
SimpleBlock
or BlockGroup Elements
and helps with seeking and error protection.
Every Cluster Element
MUST contain a Timestamp Element
.
This SHOULD be the Timestamp Element
used to play the first Block
in the Cluster Element
,
unless a different value is needed to accommodate for more Blocks, see Section 11.2.¶
Cluster Elements
contain one or more block element, such as BlockGroup
or SimpleBlock
elements.
In some situations, a Cluster Element
MAY contain no block element, for example in a live recording
when no data has been collected.¶
A BlockGroup Element
MAY contain a Block
of data and any information relating directly to that Block
.¶
Each Cluster
MUST contain exactly one Timestamp Element
. The Timestamp Element
value MUST
be stored once per Cluster
. The Timestamp Element
in the Cluster
is relative to the entire Segment
.
The Timestamp Element
SHOULD be the first Element
in the Cluster
it belongs to,
or the second Element
if that Cluster contains a CRC-32 element (Section 6.2)¶
Additionally, the Block
contains an offset that, when added to the Cluster
's Timestamp Element
value,
yields the Block
's effective timestamp. Therefore, timestamp in the Block
itself is relative to
the Timestamp Element
in the Cluster
. For example, if the Timestamp Element
in the Cluster
is set to 10 seconds and a Block
in that Cluster
is supposed to be played 12 seconds into the clip,
the timestamp in the Block
would be set to 2 seconds.¶
The ReferenceBlock
in the BlockGroup
is used instead of the basic "P-frame"/"B-frame" description.
Instead of simply saying that this Block
depends on the Block
directly before, or directly afterwards,
the Timestamp
of the necessary Block
is used. Because there can be as many ReferenceBlock Elements
as necessary for a Block
, it allows for some extremely complex referencing.¶
The Cues Element
is used to seek when playing back a file by providing a temporal index
for some of the Tracks
. It is similar to the SeekHead Element
, but used for seeking to
a specific time when playing back the file. It is possible to seek without this element,
but it is much more difficult because a Matroska Reader
would have to 'hunt and peck'
through the file looking for the correct timestamp.¶
The Cues Element
SHOULD contain at least one CuePoint Element
. Each CuePoint Element
stores the position of the Cluster
that contains the BlockGroup
or SimpleBlock Element
.
The timestamp is stored in the CueTime Element
and location is stored in the CueTrackPositions Element
.¶
The Cues Element
is flexible. For instance, Cues Element
can be used to index every
single timestamp of every Block
or they can be indexed selectively.¶
The Attachments Element
is for attaching files to a Matroska file such as pictures,
fonts, webpages, etc.¶
The Tags Element
contains metadata that describes the Segment
and potentially
its Tracks
, Chapters
, and Attachments
. Each Track
or Chapter
that those tags
applies to has its UID listed in the Tags
. The Tags
contain all extra information about
the file: scriptwriter, singer, actors, directors, titles, edition, price, dates, genre, comments,
etc. Tags can contain their values in multiple languages. For example, a movie's "title" Tag
might contain both the original English title as well as the title it was released as in Germany.¶
This specification includes an EBML Schema
, which defines the Elements and structure
of Matroska using the EBML Schema elements and attributes defined in Section 11.1 of [RFC8794].
The EBML Schema defines every valid Matroska element in a manner defined by the EBML specification.¶
Attributes using their default value like minOccurs
, minver
, etc. or with undefined values like length
, maxver
, etc. are omitted.¶
Here the definition of each Matroska Element is provided.¶
unknownsizeallowed: True¶
\Segment
¶
\Segment\SeekHead
¶
\Segment\SeekHead\Seek
¶
\Segment\SeekHead\Seek\SeekPosition
¶
recurring: True¶
\Segment\Info\SegmentUUID
¶
\Segment\Info\PrevUUID
¶
\Segment\Info\PrevFilename
¶
\Segment\Info\NextUUID
¶
\Segment\Info\NextFilename
¶
\Segment\Info\SegmentFamily
¶
ChapterTranslate
element, this Element is REQUIRED.¶
\Segment\Info\ChapterTranslate
¶
Segment
and a segment value in the given Chapter Codec.¶
\Segment\Info\ChapterTranslate\ChapterTranslateID
¶
\Segment\Info\ChapterTranslate\ChapterTranslateCodec
¶
ChapterTranslate
applies to this chapter codec of the given chapter edition(s); see Section 5.1.7.1.4.15.¶
defined values:¶
value | label | definition |
---|---|---|
0
|
Matroska Script | Chapter commands using the Matroska Script codec. |
1
|
DVD-menu | Chapter commands using the DVD-like codec. |
\Segment\Info\ChapterTranslate\ChapterTranslateEditionUID
¶
ChapterTranslate
applies.¶
ChapterTranslateEditionUID
is specified in the ChapterTranslate
, the ChapterTranslate
applies to all chapter editions found in the Segment using the given ChapterTranslateCodec
.¶
\Segment\Info\TimestampScale
¶
unknownsizeallowed: True¶
\Segment\Cluster
¶
\Segment\Cluster\Timestamp
¶
\Segment\Cluster\SimpleBlock
¶
\Segment\Cluster\BlockGroup
¶
\Segment\Cluster\BlockGroup\Block
¶
\Segment\Cluster\BlockGroup\BlockAdditions
¶
\Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAddID
¶
\Segment\Cluster\BlockGroup\BlockDuration
¶
notes:¶
attribute | note |
---|---|
minOccurs | BlockDuration MUST be set (minOccurs=1) if the associated TrackEntry stores a DefaultDuration value. |
default | When not written and with no DefaultDuration, the value is assumed to be the difference between the timestamp of this Block and the timestamp of the next Block in "display" order (not coding order). |
\Segment\Cluster\BlockGroup\ReferencePriority
¶
\Segment\Cluster\BlockGroup\ReferenceBlock
¶
Block
this Block
depends on.
Historically Matroska Writer didn't write the actual Block(s)
this Block
depends on, but some Block
in the past.¶
The value "0" MAY also be used to signify this Block
cannot be decoded on its own, but without knownledge of which Block
is necessary. In this case, other ReferenceBlock
MUST NOT be found in the same BlockGroup
.¶
If the BlockGroup
doesn't have any ReferenceBlock
element, then the Block
it contains can be decoded without using any other Block
data.¶
\Segment\Cluster\BlockGroup\DiscardPadding
¶
recurring: True¶
\Segment\Tracks\TrackEntry
¶
\Segment\Tracks\TrackEntry\TrackType
¶
TrackType
defines the type of each frame found in the Track.
The value SHOULD be stored on 1 octet.¶
defined values:¶
value | label | each frame contains |
---|---|---|
1
|
video | An image. |
2
|
audio | Audio samples. |
3
|
complex | A mix of different other TrackType. The codec needs to define how the Matroska Player should interpret such data. |
16
|
logo | An image to be rendered over the video track(s). |
17
|
subtitle | Subtitle or closed caption data to be rendered over the video track(s). |
18
|
buttons | Interactive button(s) to be rendered over the video track(s). |
32
|
control | Metadata used to control the player of the Matroska Player . |
33
|
metadata | Timed metadata that can be passed on to the Matroska Player . |
\Segment\Tracks\TrackEntry\FlagForced
¶
\Segment\Tracks\TrackEntry\FlagLacing
¶
\Segment\Tracks\TrackEntry\DefaultDecodedFieldDuration
¶
\Segment\Tracks\TrackEntry\TrackTimestampScale
¶
\Segment\Tracks\TrackEntry\MaxBlockAdditionID
¶
\Segment\Tracks\TrackEntry\BlockAdditionMapping
¶
\Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDValue
¶
\Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDType
¶
\Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDExtraData
¶
\Segment\Tracks\TrackEntry\Language
¶
\Segment\Tracks\TrackEntry\CodecID
¶
\Segment\Tracks\TrackEntry\CodecDelay
¶
\Segment\Tracks\TrackEntry\SeekPreRoll
¶
\Segment\Tracks\TrackEntry\TrackTranslate
¶
TrackEntry
and a track value in the given Chapter Codec.¶
\Segment\Tracks\TrackEntry\TrackTranslate\TrackTranslateTrackID
¶
TrackEntry
in the chapter codec data.
The format depends on the ChapProcessCodecID
used; see Section 5.1.7.1.4.15.¶
\Segment\Tracks\TrackEntry\TrackTranslate\TrackTranslateCodec
¶
TrackTranslate
applies to this chapter codec of the given chapter edition(s); see Section 5.1.7.1.4.15.¶
defined values:¶
value | label | definition |
---|---|---|
0
|
Matroska Script | Chapter commands using the Matroska Script codec. |
1
|
DVD-menu | Chapter commands using the DVD-like codec. |
\Segment\Tracks\TrackEntry\TrackTranslate\TrackTranslateEditionUID
¶
TrackTranslate
applies.¶
TrackTranslateEditionUID
is specified in the TrackTranslate
, the TrackTranslate
applies to all chapter editions found in the Segment using the given TrackTranslateCodec
.¶
\Segment\Tracks\TrackEntry\Video
¶
\Segment\Tracks\TrackEntry\Video\FlagInterlaced
¶
defined values:¶
value | label | definition |
---|---|---|
0
|
undetermined | Unknown status.This value SHOULD be avoided. |
1
|
interlaced | Interlaced frames. |
2
|
progressive | No interlacing. |
\Segment\Tracks\TrackEntry\Video\FieldOrder
¶
defined values:¶
value | label | definition |
---|---|---|
0
|
progressive | Interlaced frames.This value SHOULD be avoided, setting FlagInterlaced to 2 is sufficient. |
1
|
tff | Top field displayed first. Top field stored first. |
2
|
undetermined | Unknown field order.This value SHOULD be avoided. |
6
|
bff | Bottom field displayed first. Bottom field stored first. |
9
|
bff(swapped) | Top field displayed first. Fields are interleaved in storage with the top line of the top field stored first. |
14
|
tff(swapped) | Bottom field displayed first. Fields are interleaved in storage with the top line of the top field stored first. |
\Segment\Tracks\TrackEntry\Video\StereoMode
¶
restrictions:¶
value | label |
---|---|
0
|
mono |
1
|
side by side (left eye first) |
2
|
top - bottom (right eye is first) |
3
|
top - bottom (left eye is first) |
4
|
checkboard (right eye is first) |
5
|
checkboard (left eye is first) |
6
|
row interleaved (right eye is first) |
7
|
row interleaved (left eye is first) |
8
|
column interleaved (right eye is first) |
9
|
column interleaved (left eye is first) |
10
|
anaglyph (cyan/red) |
11
|
side by side (right eye first) |
12
|
anaglyph (green/magenta) |
13
|
both eyes laced in one Block (left eye is first) |
14
|
both eyes laced in one Block (right eye is first) |
\Segment\Tracks\TrackEntry\Video\AlphaMode
¶
CodecID
.
Undefined values SHOULD NOT be used as the behavior of known implementations is different (considered either as 0 or 1).¶
defined values:¶
value | label | definition |
---|---|---|
0
|
none | The BlockAdditional Element with BlockAddID of "1" does not exist or SHOULD NOT be considered as containing such data. |
1
|
present | The BlockAdditional Element with BlockAddID of "1" contains alpha channel data. |
\Segment\Tracks\TrackEntry\Video\OldStereoMode
¶
restrictions:¶
value | label |
---|---|
0
|
mono |
1
|
right eye |
2
|
left eye |
3
|
both eyes |
\Segment\Tracks\TrackEntry\Video\DisplayWidth
¶
notes:¶
attribute | note |
---|---|
default | If the DisplayUnit of the same TrackEntry is 0, then the default value for DisplayWidth is equal toPixelWidth - PixelCropLeft - PixelCropRight, else there is no default value. |
\Segment\Tracks\TrackEntry\Video\DisplayHeight
¶
notes:¶
attribute | note |
---|---|
default | If the DisplayUnit of the same TrackEntry is 0, then the default value for DisplayHeight is equal toPixelHeight - PixelCropTop - PixelCropBottom, else there is no default value. |
\Segment\Tracks\TrackEntry\Video\DisplayUnit
¶
restrictions:¶
value | label |
---|---|
0
|
pixels |
1
|
centimeters |
2
|
inches |
3
|
display aspect ratio |
4
|
unknown |
\Segment\Tracks\TrackEntry\Video\UncompressedFourCC
¶
BITMAPINFO
[AVIFormat]. There is no definitive list of FourCC values, nor an official registry. Some common values for YUV pixel formats can be found at [MSYUV8], [MSYUV16] and [FourCC-YUV]. Some common values for uncompressed RGB pixel formats can be found at [MSRGB] and [FourCC-RGB].¶
notes:¶
attribute | note |
---|---|
minOccurs | UncompressedFourCC MUST be set (minOccurs=1) in TrackEntry, when the CodecID Element of the TrackEntry is set to "V_UNCOMPRESSED". |
\Segment\Tracks\TrackEntry\Video\Colour\MatrixCoefficients
¶
restrictions:¶
value | label |
---|---|
0
|
Identity |
1
|
ITU-R BT.709 |
2
|
unspecified |
3
|
reserved |
4
|
US FCC 73.682 |
5
|
ITU-R BT.470BG |
6
|
SMPTE 170M |
7
|
SMPTE 240M |
8
|
YCoCg |
9
|
BT2020 Non-constant Luminance |
10
|
BT2020 Constant Luminance |
11
|
SMPTE ST 2085 |
12
|
Chroma-derived Non-constant Luminance |
13
|
Chroma-derived Constant Luminance |
14
|
ITU-R BT.2100-0 |
\Segment\Tracks\TrackEntry\Video\Colour\ChromaSubsamplingHorz
¶
\Segment\Tracks\TrackEntry\Video\Colour\ChromaSubsamplingVert
¶
\Segment\Tracks\TrackEntry\Video\Colour\CbSubsamplingHorz
¶
\Segment\Tracks\TrackEntry\Video\Colour\ChromaSitingHorz
¶
restrictions:¶
value | label |
---|---|
0
|
unspecified |
1
|
left collocated |
2
|
half |
\Segment\Tracks\TrackEntry\Video\Colour\ChromaSitingVert
¶
restrictions:¶
value | label |
---|---|
0
|
unspecified |
1
|
top collocated |
2
|
half |
\Segment\Tracks\TrackEntry\Video\Colour\Range
¶
restrictions:¶
value | label |
---|---|
0
|
unspecified |
1
|
broadcast range |
2
|
full range (no clipping) |
3
|
defined by MatrixCoefficients / TransferCharacteristics |
\Segment\Tracks\TrackEntry\Video\Colour\TransferCharacteristics
¶
restrictions:¶
value | label |
---|---|
0
|
reserved |
1
|
ITU-R BT.709 |
2
|
unspecified |
3
|
reserved2 |
4
|
Gamma 2.2 curve - BT.470M |
5
|
Gamma 2.8 curve - BT.470BG |
6
|
SMPTE 170M |
7
|
SMPTE 240M |
8
|
Linear |
9
|
Log |
10
|
Log Sqrt |
11
|
IEC 61966-2-4 |
12
|
ITU-R BT.1361 Extended Colour Gamut |
13
|
IEC 61966-2-1 |
14
|
ITU-R BT.2020 10 bit |
15
|
ITU-R BT.2020 12 bit |
16
|
ITU-R BT.2100 Perceptual Quantization |
17
|
SMPTE ST 428-1 |
18
|
ARIB STD-B67 (HLG) |
\Segment\Tracks\TrackEntry\Video\Colour\Primaries
¶
restrictions:¶
value | label |
---|---|
0
|
reserved |
1
|
ITU-R BT.709 |
2
|
unspecified |
3
|
reserved2 |
4
|
ITU-R BT.470M |
5
|
ITU-R BT.470BG - BT.601 625 |
6
|
ITU-R BT.601 525 - SMPTE 170M |
7
|
SMPTE 240M |
8
|
FILM |
9
|
ITU-R BT.2020 |
10
|
SMPTE ST 428-1 |
11
|
SMPTE RP 432-2 |
12
|
SMPTE EG 432-2 |
22
|
EBU Tech. 3213-E - JEDEC P22 phosphors |
\Segment\Tracks\TrackEntry\Video\Projection\ProjectionType
¶
restrictions:¶
value | label |
---|---|
0
|
rectangular |
1
|
equirectangular |
2
|
cubemap |
3
|
mesh |
\Segment\Tracks\TrackEntry\Video\Projection\ProjectionPrivate
¶
ProjectionType
equals 0 (Rectangular),
then this element MUST NOT be present.¶
ProjectionType
equals 1 (Equirectangular), then this element MUST be present and contain the same binary data that would be stored inside
an ISOBMFF Equirectangular Projection Box ('equi').¶
ProjectionType
equals 2 (Cubemap), then this element MUST be present and contain the same binary data that would be stored
inside an ISOBMFF Cubemap Projection Box ('cbmp').¶
ProjectionType
equals 3 (Mesh), then this element MUST be present and contain the same binary data that would be stored inside
an ISOBMFF Mesh Projection Box ('mshp').¶
\Segment\Tracks\TrackEntry\Video\Projection\ProjectionPoseYaw
¶
Value represents a clockwise rotation, in degrees, around the up vector. This rotation must be applied
before any ProjectionPosePitch
or ProjectionPoseRoll
rotations.
The value of this element MUST be in the -180 to 180 degree range, both included.¶
Setting ProjectionPoseYaw
to 180 or -180 degrees, with the ProjectionPoseRoll
and ProjectionPosePitch
set to 0 degrees flips the image horizontally.¶
\Segment\Tracks\TrackEntry\Video\Projection\ProjectionPosePitch
¶
Value represents a counter-clockwise rotation, in degrees, around the right vector. This rotation must be applied
after the ProjectionPoseYaw
rotation and before the ProjectionPoseRoll
rotation.
The value of this element MUST be in the -90 to 90 degree range, both included.¶
\Segment\Tracks\TrackEntry\Video\Projection\ProjectionPoseRoll
¶
Value represents a counter-clockwise rotation, in degrees, around the forward vector. This rotation must be applied
after the ProjectionPoseYaw
and ProjectionPosePitch
rotations.
The value of this element MUST be in the -180 to 180 degree range, both included.¶
Setting ProjectionPoseRoll
to 180 or -180 degrees, the ProjectionPoseYaw
to 180 or -180 degrees with ProjectionPosePitch
set to 0 degrees flips the image vertically.¶
Setting ProjectionPoseRoll
to 180 or -180 degrees, with the ProjectionPoseYaw
and ProjectionPosePitch
set to 0 degrees flips the image horizontally and vertically.¶
\Segment\Tracks\TrackEntry\Audio
¶
\Segment\Tracks\TrackEntry\Audio\OutputSamplingFrequency
¶
notes:¶
attribute | note |
---|---|
default | The default value for OutputSamplingFrequency of the same TrackEntry is equal to the SamplingFrequency. |
\Segment\Tracks\TrackEntry\TrackOperation
¶
stream copy: True (Section 8)¶
\Segment\Tracks\TrackEntry\TrackOperation\TrackCombinePlanes\TrackPlane\TrackPlaneType
¶
restrictions:¶
value | label |
---|---|
0
|
left eye |
1
|
right eye |
2
|
background |
\Segment\Tracks\TrackEntry\ContentEncodings
¶
stream copy: True (Section 8)¶
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentEncodingOrder
¶
ContentEncoding
of the ContentEncodings
.
The decoder/demuxer MUST start with the ContentEncoding
with the highest ContentEncodingOrder
and work its way down to the ContentEncoding
with the lowest ContentEncodingOrder
.
This value MUST be unique over for each ContentEncoding
found in the ContentEncodings
of this TrackEntry
.¶
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentEncodingScope
¶
defined values:¶
value | label | definition |
---|---|---|
1
|
Block | All frame contents, excluding lacing data. |
2
|
Private | The track's CodecPrivate data. |
4
|
Next | The next ContentEncoding (next ContentEncodingOrder . Either the data inside ContentCompression and/or ContentEncryption ).This value SHOULD NOT be used as it's not supported by players. |
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentEncodingType
¶
restrictions:¶
value | label |
---|---|
0
|
Compression |
1
|
Encryption |
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentCompression
¶
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentCompression\ContentCompAlgo
¶
defined values:¶
value | label | definition |
---|---|---|
0
|
zlib | zlib compression [RFC1950]. |
1
|
bzlib | bzip2 compression [BZIP2], SHOULD NOT be used; see usage notes. |
2
|
lzo1x | Lempel-Ziv-Oberhumer compression [LZO], SHOULD NOT be used; see usage notes. |
3
|
Header Stripping | Octets in ContentCompSettings (Section 5.1.4.1.31.7) have been stripped from each frame. |
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentCompression\ContentCompSettings
¶
ContentCompAlgo
=3),
the bytes that were removed from the beginning of each frames of the track.¶
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentEncryption
¶
ContentEncodingType
is 1 (encryption) and MUST be ignored otherwise.
A Matroska Player MAY support encryption.¶
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentEncryption\ContentEncAlgo
¶
defined values:¶
value | label | definition |
---|---|---|
0
|
Not encrypted | The data are not encrypted. |
1
|
DES | Data Encryption Standard (DES) [FIPS.46-3].This value SHOULD be avoided. |
2
|
3DES | Triple Data Encryption Algorithm [SP.800-67].This value SHOULD be avoided. |
3
|
Twofish | Twofish Encryption Algorithm [Twofish]. |
4
|
Blowfish | Blowfish Encryption Algorithm [Blowfish].This value SHOULD be avoided. |
5
|
AES | Advanced Encryption Standard (AES) [FIPS.197]. |
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentEncryption\ContentEncAESSettings
¶
notes:¶
attribute | note |
---|---|
maxOccurs | ContentEncAESSettings MUST NOT be set (maxOccurs=0) if ContentEncAlgo is not AES (5). |
\Segment\Tracks\TrackEntry\ContentEncodings\ContentEncoding\ContentEncryption\ContentEncAESSettings\AESSettingsCipherMode
¶
defined values:¶
value | label | definition |
---|---|---|
1
|
AES-CTR | Counter [SP.800-38A]. |
2
|
AES-CBC | Cipher Block Chaining [SP.800-38A]. |
notes:¶
attribute | note |
---|---|
maxOccurs | AESSettingsCipherMode MUST NOT be set (maxOccurs=0) if ContentEncAlgo is not AES (5). |
\Segment\Cues
¶
notes:¶
attribute | note |
---|---|
minOccurs | This Element SHOULD be set when the Segment is not transmitted as a live stream; see Section 23.2. |
\Segment\Cues\CuePoint
¶
\Segment\Cues\CuePoint\CueTime
¶
\Segment\Cues\CuePoint\CueTrackPositions
¶
\Segment\Cues\CuePoint\CueTrackPositions\CueClusterPosition
¶
\Segment\Cues\CuePoint\CueTrackPositions\CueDuration
¶
\Segment\Cues\CuePoint\CueTrackPositions\CueCodecState
¶
\Segment\Attachments
¶
\Segment\Attachments\AttachedFile
¶
recurring: True¶
\Segment\Chapters\EditionEntry
¶
recursive: True¶
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapterTimeStart
¶
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapterTimeEnd
¶
ChapterTimeStart
of the same ChapterAtom
.¶
ChapterTimeEnd
timestamp value being excluded, it MUST take in account the duration of
the last frame it includes, especially for the ChapterAtom
using the last frames of the Segment
.¶
notes:¶
attribute | note |
---|---|
minOccurs | ChapterTimeEnd MUST be set (minOccurs=1) if the Edition is an ordered edition; see Section 20.1.3, unless it's a Parent Chapter ; see Section 20.2.3
|
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapterSegmentUUID
¶
SegmentUUID
value of the Segment
it belongs to.¶
notes:¶
attribute | note |
---|---|
minOccurs | ChapterSegmentUUID MUST be set (minOccurs=1) if ChapterSegmentEditionUID is used; see Section 17.2 on medium-linking Segments. |
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapterSegmentEditionUID
¶
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapterPhysicalEquiv
¶
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapterDisplay\ChapLanguage
¶
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapterDisplay\ChapLanguageBCP47
¶
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapterDisplay\ChapCountry
¶
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapProcess\ChapProcessCodecID
¶
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapProcess\ChapProcessPrivate
¶
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapProcess\ChapProcessCommand\ChapProcessTime
¶
restrictions:¶
value | label |
---|---|
0
|
during the whole chapter |
1
|
before starting playback |
2
|
after playback of the chapter |
\Segment\Chapters\EditionEntry\+ChapterAtom\ChapProcess\ChapProcessCommand\ChapProcessData
¶
Except for the EBML Header
and the CRC-32 Element
, the EBML specification does not
require any particular storage order for Elements
. This specification however
defines mandates and recommendations for ordering certain Elements
in order to facilitate
better playback, seeking, and editing efficiency. This section describes and offers
rationale for ordering requirements and recommendations for Matroska.¶
The Info Element
is the only REQUIRED Top-Level Element
in a Matroska file.
To be playable, Matroska MUST also contain at least one Tracks Element
and Cluster Element
.
The first Info Element
and the first Tracks Element
MUST either be stored before the first
Cluster Element
or both SHALL be referenced by a SeekHead Element
occurring before the first Cluster Element
.¶
All Top-Level Elements
MUST use a 4-octet long EBML Element ID.¶
When using Medium Linking, chapters are used to reference other Segments to play in a given order Section 17.2.
A Segment containing these linked Chapters does not require a Track
Element or a Cluster
Element.¶
It is possible to edit a Matroska file after it has been created. For example, chapters,
tags, or attachments can be added. When new Top-Level Elements
are added to a Matroska file,
the SeekHead
Element(s) MUST be updated so that the SeekHead
Element(s) itemize
the identity and position of all Top-Level Elements
.¶
Editing, removing, or adding
Elements
to a Matroska file often requires that some existing Elements
be voided
or extended.
Transforming the existing Elements
into Void Elements
as padding can be used
as a method to avoid moving large amounts of data around.¶
As noted by the EBML specification, if a CRC-32 Element
is used, then the CRC-32 Element
MUST be the first ordered Element
within its Parent Element
.¶
In Matroska all Top-Level Elements
of an EBML Document SHOULD include a CRC-32 Element
as their first Child Element
.
The Segment Element
, which is the Root Element
, SHOULD NOT have a CRC-32 Element
.¶
If used, the first SeekHead Element
MUST be the first non-CRC-32 Child Element
of the Segment Element
. If a second SeekHead Element
is used, then the first
SeekHead Element
MUST reference the identity and position of the second SeekHead
.¶
Additionally, the second SeekHead Element
MUST only reference Cluster
Elements
and not any other Top-Level Element
already contained within the first SeekHead Element
.¶
The second SeekHead Element
MAY be stored in any order relative to the other Top-Level Elements
.
Whether one or two SeekHead Element(s)
are used, the SeekHead Element(s)
MUST
collectively reference the identity and position of all Top-Level Elements
except
for the first SeekHead Element
.¶
The Cues Element
is RECOMMENDED to optimize seeking access in Matroska. It is
programmatically simpler to add the Cues Element
after all Cluster Elements
have been written because this does not require a prediction of how much space to
reserve before writing the Cluster Elements
. However, storing the Cues Element
before the Cluster Elements
can provide some seeking advantages. If the Cues Element
is present, then it SHOULD either be stored before the first Cluster Element
or be referenced by a SeekHead Element
.¶
The first Info Element
SHOULD occur before the first Tracks Element
and first
Cluster Element
except when referenced by a SeekHead Element
.¶
The Chapters Element
SHOULD be placed before the Cluster Element(s)
. The
Chapters Element
can be used during playback even if the user does not need to seek.
It immediately gives the user information about what section is being read and what
other sections are available. In the case of Ordered Chapters it is RECOMMENDED to evaluate
the logical linking even before playing. The Chapters Element
SHOULD be placed before
the first Tracks Element
and after the first Info Element
.¶
The Attachments Element
is not intended to be used by default when playing the file,
but could contain information relevant to the content, such as cover art or fonts.
Cover art is useful even before the file is played and fonts could be needed before playback
starts for initialization of subtitles. The Attachments Element
MAY be placed before
the first Cluster Element
; however, if the Attachments Element
is likely to be edited,
then it SHOULD be placed after the last Cluster Element
.¶
Matroska is based upon the principle that a reading application does not have to support 100% of the specifications in order to be able to play the file. A Matroska file therefore contains version indicators that tell a reading application what to expect.¶
It is possible and valid to have the version fields indicate that the file contains
Matroska Elements
from a higher specification version number while signaling that a
reading application MUST only support a lower version number properly in order to play
it back (possibly with a reduced feature set).¶
The EBML Header
of each Matroska document informs the reading application on what
version of Matroska to expect. The Elements
within EBML Header
with jurisdiction
over this information are DocTypeVersion
and DocTypeReadVersion
.¶
DocTypeVersion
MUST be equal to or greater than the highest Matroska version number of
any Element
present in the Matroska file. For example, a file using the SimpleBlock Element
(Section 5.1.3.4)
MUST have a DocTypeVersion
equal to or greater than 2. A file containing CueRelativePosition
Elements (Section 5.1.5.1.2.3) MUST have a DocTypeVersion
equal to or greater than 4.¶
The DocTypeReadVersion
MUST contain the minimum version number that a reading application
can minimally support in order to play the file back -- optionally with a reduced feature
set. For example, if a file contains only Elements
of version 2 or lower except for
CueRelativePosition
(which is a version 4 Matroska Element
), then DocTypeReadVersion
SHOULD still be set to 2 and not 4 because evaluating CueRelativePosition
is not
necessary for standard playback -- it makes seeking more precise if used.¶
A reading application supporting Matroska version V
MUST NOT refuse to read a
file with DocReadTypeVersion
equal to or lower than V
even if DocTypeVersion
is greater than V
.¶
A reading application
supporting at least Matroska version V
reading a file whose DocTypeReadVersion
field is equal to or lower than V
MUST skip Matroska/EBML Elements
it encounters
but does not know about if that unknown element fits into the size constraints set
by the current Parent Element
.¶
It is sometimes necessary to create a Matroska file from another Matroska file, for example to add subtitles in a language or to edit out a portion of the content. Some values from the original Matroska file need to be kept the same in the destination file. For example, the SamplingFrequency of an audio track wouldn't change between the two files. Some other values may change between the two files, for example the TrackNumber of an audio track when another track has been added.¶
An Element is marked with a property: stream copy: True
when the values of that Element need to be kept identical between the source and destination file.
If that property is not set, elements may or may not keep the same value between the source and destination.¶
The DefaultDecodedFieldDuration Element
can signal to the displaying application how
often fields of a video sequence will be available for displaying. It can be used for both
interlaced and progressive content.¶
If the video sequence is signaled as interlaced Section 5.1.4.1.28.1, then DefaultDecodedFieldDuration
equals
the period between two successive fields at the output of the decoding process.
For video sequences signaled as progressive, DefaultDecodedFieldDuration
is half of
the period between two successive frames at the output of the decoding process.¶
These values are valid at the end of the decoding process before post-processing (such as deinterlacing or inverse telecine) is applied.¶
Examples:¶
Frames using references SHOULD be stored in "coding order". That means the references first, and then the frames referencing them. A consequence is that timestamps might not be consecutive. But a frame with a past timestamp MUST reference a frame already known, otherwise it's considered bad/void.¶
Matroska has two similar ways to store frames in a block:¶
Block
which is contained inside a BlockGroup
,¶
SimpleBlock
which is directly in the Cluster
.¶
The SimpleBlock
is usually preferred unless some extra elements of the BlockGroup
need to be used.
A Matroska Reader MUST support both types of blocks.¶
Each block contains the same parts in the following order:¶
The block header starts with the number of the Track it corresponds to.
The value MUST corresponding to the TrackNumber
(Section 5.1.4.1.1) of a TrackEntry
of the Segment
.¶
The TrackNumber
is coded using the VINT mechanism described in Section 4 of [RFC8794].
To save space, the shortest VINT form SHOULD be used. The value can be coded on up to 8 octets.
This is the only element with a variable size in the block header.¶
The timestamp is expressed in Track Ticks; see Section 11.1. The value is stored as a signed value on 16 bits.¶
This section describes the binary data contained in the Block
Element Section 5.1.3.5.1. Bit 0 is the most significant bit.¶
As the TrackNumber
size can vary between 1 and 8 octets, there are 8 different sizes for the Block
header.
We only provide the definitions for TrackNumber
sizes of 1 and 2.
The other variants can be deduced by extending the size of the TrackNumber
by multiples of 8 bits.¶
where:¶
using lacing mode¶
The following data in the Block
correspond to the lacing data and frames usage as described in each respective lacing mode.¶
This section describes the binary data contained in the SimpleBlock
Element Section 5.1.3.4. Bit 0 is the most significant bit.¶
The SimpleBlock
is inspired by the Block structure; see Section 10.1.
The main differences are the added Keyframe flag and Discardable flag. Otherwise, everything is the same.¶
As the TrackNumber
size can vary between 1 and 8 octets, there are 8 different sizes for the SimpleBlock
header.
We only provide the definitions for TrackNumber
sizes of 1 and 2.
The other variants can be deduced by extending the size of the TrackNumber
by multiples of 8 bits.¶
where:¶
using lacing mode¶
The following data in the SimpleBlock
correspond to the lacing data and frames usage as described in each respective lacing mode.¶
Lacing is a mechanism to save space when storing data. It is typically used for small blocks
of data (referred to as frames in Matroska). It packs multiple frames into a single Block
or SimpleBlock
.¶
Lacing MUST NOT be used to store a single frame in a Block
or SimpleBlock
.¶
There are 3 types of lacing:¶
When lacing is not used, i.e. to store a single frame, the lacing bits 5 and 6 of the Block
or SimpleBlock
MUST be set to zero.¶
For example, a user wants to store 3 frames of the same track. The first frame is 800 octets long, the second is 500 octets long and the third is 1000 octets long. As these data are small, they can be stored in a lace to save space.¶
It is possible not to use lacing at all and just store a single frame without any extra data. When the FlagLacing -- Section 5.1.4.1.12 -- is set to "0" all blocks of that track MUST NOT use lacing.¶
When no lacing is used, the number of frames in the lace is ommitted and only one frame can be stored in the Block.
The bits 5-6 of the Block Header flags are set to 0b00
.¶
The Block for an 800 octets frame is as follows:¶
When a Block contains a single frame, it MUST use this No lacing mode.¶
The Xiph lacing uses the same coding of size as found in the Ogg container [RFC3533].
The bits 5-6 of the Block Header flags are set to 0b01
.¶
The Block data with laced frames is stored as follows:¶
The lacing size is split into 255 values, stored as unsigned octets -- for example, 500 is coded 255;245 or [0xFF 0xF5]. A frame with a size multiple of 255 is coded with a 0 at the end of the size -- for example, 765 is coded 255;255;255;0 or [0xFF 0xFF 0xFF 0x00].¶
The size of the last frame is deduced from the size remaining in the Block after the other frames.¶
Because large sizes result in large coding of the sizes, it is RECOMMENDED to use Xiph lacing only with small frames.¶
In our example, the 800, 500 and 1000 frames are stored with Xiph lacing in a Block as follows:¶
Block Octet | Value | Description |
---|---|---|
4 | 0x02 | Number of frames minus 1 |
5-8 | 0xFF 0xFF 0xFF 0x23 | Size of the first frame (255;255;255;35) |
9-10 | 0xFF 0xF5 | Size of the second frame (255;245) |
11-810 | First frame data | |
811-1310 | Second frame data | |
1311-2310 | Third frame data |
The Block is 2311 octets large and the last frame starts at 1311, so we can deduce the size of the last frame is 2311 - 1311 = 1000.¶
The EBML lacing encodes the frame size with an EBML-like encoding [RFC8794].
The bits 5-6 of the Block Header flags are set to 0b11
.¶
The Block data with laced frames is stored as follows:¶
The first frame size is encoded as an EBML Variable-Size Integer value, also known as VINT in [RFC8794].
The remaining frame sizes are encoded as signed values using the difference between the frame size and the previous frame size.
These signed values are encoded as VINT, with a mapping from signed to unsigned numbers.
Decoding the unsigned number stored in the VINT to a signed number is done by subtracting 2((7*n)-1)-1, where n
is the octet size of the VINT.¶
Bit Representation of signed VINT | Possible Value Range |
---|---|
1xxx xxxx | 2^7 values from -(26-1) to 26 |
01xx xxxx xxxx xxxx | 2^14 values from -(213-1) to 213 |
001x xxxx xxxx xxxx xxxx xxxx | 2^21 values from -(220-1) to 220 |
0001 xxxx xxxx xxxx xxxx xxxx xxxx xxxx | 2^28 values from -(227-1) to 227 |
0000 1xxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx | 2^35 values from -(234-1) to 234 |
In our example, the 800, 500 and 1000 frames are stored with EBML lacing in a Block as follows:¶
Block Octets | Value | Description |
---|---|---|
4 | 0x02 | Number of frames minus 1 |
5-6 | 0x43 0x20 | Size of the first frame (800 = 0x320 + 0x4000) |
7-8 | 0x5E 0xD3 | Size of the second frame (500 - 800 = -300 = - 0x12C + 0x1FFF + 0x4000) |
8-807 | <frame1> | First frame data |
808-1307 | <frame2> | Second frame data |
1308-2307 | <frame3> | Third frame data |
The Block is 2308 octets large and the last frame starts at 1308, so we can deduce the size of the last frame is 2308 - 1308 = 1000.¶
The Fixed-size lacing doesn't store the frame size, only the number of frames in the lace.
Each frame MUST have the same size. The frame size of each frame is deduced from the total size of the Block.
The bits 5-6 of the Block Header flags are set to 0b10
.¶
The Block data with laced frames is stored as follows:¶
For example, for 3 frames of 800 octets each:¶
Block Octets | Value | Description |
---|---|---|
4 | 0x02 | Number of frames minus 1 |
5-804 | <frame1> | First frame data |
805-1604 | <frame2> | Second frame data |
1605-2404 | <frame3> | Third frame data |
This gives a Block of 2405 octets. When reading the Block we find that there are 3 frames (Octet 4). The data start at Octet 5, so the size of each frame is (2405 - 5) / 3 = 800.¶
A Block only contains a single timestamp value. But when lacing is used, it contains more than one frame. Each frame originally has its own timestamp, or Presentation Timestamp (PTS). That timestamp applies to the first frame in the lace.¶
In the lace, each frame after the first one has an underdetermined timestamp. But each of these frames MUST be contiguous -- i.e. the decoded data MUST NOT contain any gap between them. If there is a gap in the stream, the frames around the gap MUST NOT be in the same Block.¶
Lacing is only useful for small contiguous data to save space. This is usually the case for audio tracks and not the case for video -- which use a lot of data -- or subtitle tracks -- which have long gaps. For audio, there is usually a fixed output sampling frequency for the whole track. So the decoder should be able to recover the timestamp of each sample, knowing each output sample is contiguous with a fixed frequency. For subtitles this is usually not the case so lacing SHOULD NOT be used.¶
Random Access Points (RAP) are positions where the parser can seek to and start playback without decoding
of what was before. In Matroska BlockGroups
and SimpleBlocks
can be RAPs.
To seek to these elements it is still necessary to seek to the Cluster
containing them,
read the Cluster Timestamp
and start playback from the BlockGroup
or SimpleBlock
that is a RAP.¶
Because a Matroska File is usually composed of multiple tracks playing at the same time
-- video, audio and subtitles -- to seek properly to a RAP, each selected track must be
taken in account. Usually all audio and subtitle BlockGroup
or SimpleBlock
are RAP.
They are independent of each other and can be played randomly.¶
Video tracks on the other hand often use references to previous and future frames for better
coding efficiency. Frames with such reference MUST either contain one or more
ReferenceBlock
Elements in their BlockGroup
or MUST be marked
as non-keyframe in a SimpleBlock
; see Section 10.2.¶
<Cluster> <Timestamp>123456</Timestamp> <BlockGroup> <!-- References a Block 40 Track Ticks before this one --> <ReferenceBlock>-40</ReferenceBlock> <Block/> </BlockGroup> ... </Cluster>¶
<Cluster> <Timestamp>123456</Timestamp> <SimpleBlock/> (octet 3 bit 0 not set) ... </Cluster>¶
Frames that are RAP -- i.e. they don't depend on other frames -- MUST set the keyframe
flag if they are in a SimpleBlock
or their parent BlockGroup
MUST NOT contain
a ReferenceBlock
.¶
<Cluster> <Timestamp>123456</Timestamp> <BlockGroup> <!-- No ReferenceBlock allowed in this BlockGroup --> <Block/> </BlockGroup> ... </Cluster>¶
<Cluster> <Timestamp>123456</Timestamp> <SimpleBlock/> (octet 3 bit 0 set) ... </Cluster>¶
There may be cases where the use of BlockGroup
is necessary, as the frame may need a
BlockDuration
, BlockAdditions
, CodecState
or a DiscardPadding
element.
For thoses cases a SimpleBlock
MUST NOT be used,
the reference information SHOULD be recovered for non-RAP frames.¶
<Cluster> <Timestamp>123456</Timestamp> <SimpleBlock/> (octet 3 bit 0 not set) ... </Cluster>¶
BlockDuration
, with the EBML tree shown as XML:¶
<Cluster> <Timestamp>123456</Timestamp> <BlockGroup> <!-- ReferenceBlock value recovered based on the codec --> <ReferenceBlock>-40</ReferenceBlock> <BlockDuration>20<BlockDuration> <Block/> </BlockGroup> ... </Cluster>¶
When a frame in a BlockGroup
is not a RAP, the BlockGroup
MUST contain at least a ReferenceBlock
.
The ReferenceBlock
s MUST be used in one of the following ways:¶
ReferenceBlock
,¶
ReferenceBlock
, even if the timestamp value is accurate,¶
ReferenceBlock
with the timestamp value "0" corresponding to a self or unknown reference.¶
The lack of ReferenceBlock
would mean such a frame is a RAP and seeking on that
frame that actually depends on other frames may create bogus output or even crash.¶
<Cluster> <Timestamp>123456</Timestamp> <BlockGroup> <!-- ReferenceBlock value not recovered from the codec --> <ReferenceBlock>0</ReferenceBlock> <BlockDuration>20<BlockDuration> <Block/> </BlockGroup> ... </Cluster>¶
<Cluster> <Timestamp>123456</Timestamp> <BlockGroup> <!-- References a Block 80 Track Ticks before this one --> <ReferenceBlock>-80</ReferenceBlock> <!-- References a Block 40 Track Ticks after this one --> <ReferenceBlock>40</ReferenceBlock> <Block/> </BlockGroup> ... </Cluster>¶
Intra-only video frames, such as the ones found in AV1 or VP9, can be decoded without any other
frame, but they don't reset the codec state. So seeking to these frames is not possible
as the next frames may need frames that are not known from this seeking point.
Such intra-only frames MUST NOT be considered as keyframes so the keyframe flag
MUST NOT be set in the SimpleBlock
or a ReferenceBlock
MUST be used
to signify the frame is not a RAP. The timestamp value of the ReferenceBlock
MUST
be "0", meaning it's referencing itself.¶
<Cluster> <Timestamp>123456</Timestamp> <BlockGroup> <!-- References itself to mark it should not be used as RAP --> <ReferenceBlock>0</ReferenceBlock> <Block/> </BlockGroup> ... </Cluster>¶
Because a video SimpleBlock
has less references information than a video BlockGroup
,
it is possible to remux a video track using BlockGroup
into a SimpleBlock
,
as long as it doesn't use any other BlockGroup
features than ReferenceBlock
.¶
Historically timestamps in Matroska were mistakenly called timecodes. The Timestamp Element
was called Timecode, the TimestampScale Element
was called TimecodeScale, the
TrackTimestampScale Element
was called TrackTimecodeScale and the
ReferenceTimestamp Element
was called ReferenceTimeCode.¶
All timestamp values in Matroska are expressed in multiples of a tick. They are usually stored as integers. There are three types of ticks possible:¶
For such elements, the timestamp value is stored directly in nanoseconds.¶
The elements storing values in Matroska Ticks/nanoseconds are:¶
TrackEntry\DefaultDuration
; defined in Section 5.1.4.1.13¶
TrackEntry\DefaultDecodedFieldDuration
; defined in Section 5.1.4.1.14¶
TrackEntry\SeekPreRoll
; defined in Section 5.1.4.1.26¶
TrackEntry\CodecDelay
; defined in Section 5.1.4.1.25¶
BlockGroup\DiscardPadding
; defined in Section 5.1.3.5.7¶
ChapterAtom\ChapterTimeStart
; defined in Section 5.1.7.1.4.3¶
ChapterAtom\ChapterTimeEnd
; defined in Section 5.1.7.1.4.4¶
CuePoint\CueTime
; defined in Section 5.1.5.1.1¶
CueReference\CueRefTime
; defined in Section 5.1.5.1.1¶
Elements in Segment Ticks involve the use of the TimestampScale Element
of the Segment to get the timestamp
in nanoseconds of the element, with the following formula:¶
timestamp in nanosecond = element value * TimestampScale¶
This allows storing smaller integer values in the elements.¶
When using the default value of TimestampScale
of "1,000,000", one Segment Tick represents one millisecond.¶
The elements storing values in Segment Ticks are:¶
Cluster\Timestamp
; defined in Section 5.1.3.1¶
Info\Duration
is stored as a floating-point but the same formula applies; defined in Section 5.1.2.10¶
CuePoint\CueTrackPositions\CueDuration
; defined in Section 5.1.5.1.2.4¶
Elements in Track Ticks involve the use of the TimestampScale Element
of the Segment and the TrackTimestampScale Element
of the Track
to get the timestamp in nanoseconds of the element, with the following formula:¶
timestamp in nanoseconds = element value * TrackTimestampScale * TimestampScale¶
This allows storing smaller integer values in the elements. The resulting floating-point values of the timestamps are still expressed in nanoseconds.¶
When using the default values for TimestampScale
and TrackTimestampScale
of "1,000,000" and of "1.0" respectively, one Track Tick represents one millisecond.¶
The elements storing values in Track Ticks are:¶
Cluster\BlockGroup\Block
and Cluster\SimpleBlock
timestamps; detailed in Section 11.2¶
Cluster\BlockGroup\BlockDuration
; defined in Section 5.1.3.5.3¶
Cluster\BlockGroup\ReferenceBlock
; defined in Section 5.1.3.5.5¶
When the TrackTimestampScale
is interpreted as "1.0", Track Ticks are equivalent to Segment Ticks
and give an integer value in nanoseconds. This is the most common case as TrackTimestampScale
is usually omitted.¶
A value of TrackTimestampScale
other than "1.0" MAY be used
to scale the timestamps more in tune with each Track sampling frequency.
For historical reasons, a lot of Matroska readers don't take the TrackTimestampScale
value in account.
So using a value other than "1.0" might not work in many places.¶
A Block Element
and SimpleBlock Element
timestamp is the time when the decoded data of the first
frame in the Block/SimpleBlock MUST be presented, if the track of that Block/SimpleBlock is selected for playback.
This is also known as the Presentation Timestamp (PTS).¶
The Block Element
and SimpleBlock Element
store their timestamps as signed integers, relative
to the Cluster\Timestamp
value of the Cluster
they are stored in.
To get the timestamp of a Block
or SimpleBlock
in nanoseconds you have to use the following formula:¶
( Cluster\Timestamp + ( block timestamp * TrackTimestampScale ) ) * TimestampScale¶
The Block Element
and SimpleBlock Element
store their timestamps as 16bit signed integers,
allowing a range from "-32768" to "+32767" Track Ticks.
Although these values can be negative, when added to the Cluster\Timestamp
, the resulting frame timestamp SHOULD NOT be negative.¶
When a CodecDelay Element
is set, its value MUST be substracted from each Block timestamp of that track.
To get the timestamp in nanoseconds of the first frame in a Block
or SimpleBlock
, the formula becomes:¶
( ( Cluster\Timestamp + ( block timestamp * TrackTimestampScale ) ) * TimestampScale ) - CodecDelay¶
The resulting frame timestamp SHOULD NOT be negative.¶
During playback, when a frame has a negative timestamp, the content MUST be decoded by the decoder but not played to the user.¶
The default Track Tick duration is one millisecond.¶
The TimestampScale
is a floating-point value, which is usually 1.0. But when it's not, the multiplied
Block Timestamp is a floating-point value in nanoseconds.
The Matroska Reader
SHOULD use the nearest rounding value in nanosecond to get
the proper nanosecond timestamp of a Block. This allows some clever TimestampScale
values
to have more refined timestamp precision per frame.¶
Matroska from version 1 through 3 uses language codes that can be either the 3 letters
bibliographic ISO-639-2 form [ISO639-2] (like "fre" for French),
or such a language code followed by a dash and a country code for specialities in languages (like "fre-ca" for Canadian French).
The ISO 639-2 Language Elements
are "Language Element", "TagLanguage Element", and "ChapLanguage Element".¶
Starting in Matroska version 4, either [ISO639-2] or [BCP47] MAY be used,
although BCP 47
is RECOMMENDED. The BCP 47 Language Elements
are "LanguageBCP47 Element",
"TagLanguageBCP47 Element", and "ChapLanguageBCP47 Element". If a BCP 47 Language Element
and an ISO 639-2 Language Element
are used within the same Parent Element
, then the ISO 639-2 Language Element
MUST be ignored and precedence given to the BCP 47 Language Element
.¶
Country codes are the [BCP47] two-letter region subtag, without the UK exception.¶
This Matroska specification provides no interoperable solution for securing the
data container with any assurances of confidentiality, integrity, authenticity,
or to provide authorization. The ContentEncryption Element
(Section 5.1.4.1.31.8)
and associated sub-fields (Section 5.1.4.1.31.9 to Section 5.1.4.1.31.12) are defined
only for the benefit of implementers to construct their own proprietary solution
or as the basis for further standardization activities. How to use these
fields to secure a Matroska data container is out of scope, as are any related
issues such as key management and distribution.¶
A Matroska Reader
who encounters containers that use the fields defined in this
section MUST rely on out-of-scope guidance to decode the associated content.¶
Because encryption occurs within the Block Element
, it is possible to manipulate
encrypted streams without decrypting them. The streams could potentially be copied,
deleted, cut, appended, or any number of other possible editing techniques without
decryption. The data can be used without having to expose it or go through the decrypting process.¶
Encryption can also be layered within Matroska. This means that two completely different types of encryption can be used, requiring two separate keys to be able to decrypt a stream.¶
Encryption information is stored in the ContentEncodings Element
under the ContentEncryption Element
.¶
For encryption systems sharing public/private keys, the creation of the keys and the exchange of keys are not covered by this document. They have to be handled by the system using Matroska.¶
The algorithms described in Table 26 support different modes of operations and key sizes. The specification of these parameters is required for a complete solution, but is out of scope of this document and left to the proprietary implementations using them or subsequent profiles of this document.¶
The ContentEncodingScope Element
gives an idea of which part of the track are encrypted.
But each ContentEncAlgo Element
and its sub elements like AESSettingsCipherMode
really
define how the encrypted should be exactly interpreted.¶
An example of an extension that builds upon these security-related fields in this specification is [WebM-Enc].
It uses AES-CTR, ContentEncAlgo
= 5 (Section 5.1.4.1.31.9) and AESSettingsCipherMode
= 1 (Section 5.1.4.1.31.12).¶
A Matroska Writer
MUST NOT use insecure cryptographic algorithms to create new
archives or streams, but a Matroska Reader
MAY support these algorithms to read
previously made archives or stream.¶
The PixelCrop Elements
(PixelCropTop
, PixelCropBottom
, PixelCropRight
, and PixelCropLeft
)
indicate when, and by how much, encoded videos frames SHOULD be cropped for display.
These Elements allow edges of the frame that are not intended for display, such as the
sprockets of a full-frame film scan or the VANC area of a digitized analog videotape,
to be stored but hidden. PixelCropTop
and PixelCropBottom
store an integer of how many
rows of pixels SHOULD be cropped from the top and bottom of the image (respectively).
PixelCropLeft
and PixelCropRight
store an integer of how many columns of pixels
SHOULD be cropped from the left and right of the image (respectively).¶
For example,
a pillar-boxed video that stores a 1440x1080 visual image within the center of a padded
1920x1080 encoded image may set both PixelCropLeft
and PixelCropRight
to "240",
so that a Matroska Player
should crop off 240 columns of pixels from the left and
right of the encoded image to present the image with the pillar-boxes hidden.¶
Cropping has to be performed before resizing and the display dimensions given by
DisplayWidth
, DisplayHeight
and DisplayUnit
apply to the already cropped image.¶
The ProjectionPoseRoll Element (see Section 5.1.4.1.28.46) can be used to indicate that the image from the associated video track SHOULD be rotated for presentation. For instance, the following representation of the Projection Element Section 5.1.4.1.28.41) and the ProjectionPoseRoll Element represents a video track where the image SHOULD be presented with a 90-degree counter-clockwise rotation, with the EBML tree shown as XML :¶
The Segment Position
of an Element
refers to the position of the first octet of the
Element ID
of that Element
, measured in octets, from the beginning of the Element Data
section of the containing Segment Element
. In other words, the Segment Position
of an
Element
is the distance in octets from the beginning of its containing Segment Element
minus the size of the Element ID
and Element Data Size
of that Segment Element
.
The Segment Position
of the first Child Element
of the Segment Element
is 0.
An Element
which is not stored within a Segment Element
, such as the Elements
of
the EBML Header
, do not have a Segment Position
.¶
Elements
that are defined to store a Segment Position
MAY define reserved values to
indicate a special meaning.¶
This table presents an example of Segment Position
by showing a hexadecimal representation
of a very small Matroska file with labels to show the offsets in octets. The file contains
a Segment Element
with an Element ID
of "0x18538067" and a MuxingApp Element
with an Element ID
of "0x4D80".¶
0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 0 |1A|45|DF|A3|8B|42|82|88|6D|61|74|72|6F|73|6B|61| ^ EBML Header 0 | |18|53|80|67| ^ Segment ID 20 |93| ^ Segment Data Size 20 | |15|49|A9|66|8E|4D|80|84|69|65|74|66|57|41|84|69|65|74|66| ^ Start of Segment data 20 | |4D|80|84|69|65|74|66|57|41|84|69|65|74|66| ^ MuxingApp start¶
In the above example, the Element ID
of the Segment Element
is stored at offset 16,
the Element Data Size
of the Segment Element
is stored at offset 20, and the
Element Data
of the Segment Element
is stored at offset 21.¶
The MuxingApp Element
is stored at offset 26. Since the Segment Position
of
an Element
is calculated by subtracting the position of the Element Data
of
the containing Segment Element
from the position of that Element
, the Segment Position
of MuxingApp Element
in the above example is '26 - 21' or '5'.¶
Matroska provides several methods to link two or more Segment Elements
together to create
a Linked Segment
. A Linked Segment
is a set of multiple Segments
linked together into
a single presentation by using Hard Linking or Medium Linking.¶
All Segments
within a Linked Segment
MUST have a SegmentUUID
.¶
All Segments
within a Linked Segment
SHOULD be stored within the same directory
or be accessible quickly based on their SegmentUUID
in order to have seamless transition between segments.¶
All Segments
within a Linked Segment
MAY set a SegmentFamily
with a common value to make
it easier for a Matroska Player
to know which Segments
are meant to be played together.¶
The SegmentFilename
, PrevFilename
and NextFilename
elements MAY also give hints on
the original filenames that were used when the Segment links were created, in case some SegmentUUID
are damaged.¶
Hard Linking, also called splitting, is the process of creating a Linked Segment
by linking multiple Segment Elements
using the NextUUID
and PrevUUID
Elements.¶
All Segments
within a Hard Linked Segment
MUST use the same Tracks
list and TimestampScale
.¶
Within a Linked Segment
, the timestamps of Block
and SimpleBlock
MUST follow consecutively
the timestamps of Block
and SimpleBlock
from the previous Segment
in linking order.¶
With Hard Linking, the chapters of any Segment
within the Linked Segment
MUST only reference the current Segment
.
The NextUUID
and PrevUUID
reference the respective SegmentUUID
values of the next and previous Segments
.¶
The first Segment
of a Linked Segment
MUST NOT have a PrevUUID Element
.
The last Segment
of a Linked Segment
MUST NOT have a NextUUID Element
.¶
For each node of the chain of Segments
of a Linked Segment
at least one Segment
MUST reference the other Segment
within the chain.¶
In a chain of Segments
of a Linked Segment
the NextUUID
always takes precedence over the PrevUUID
.
So if SegmentA has a NextUUID
to SegmentB and SegmentB has a PrevUUID
to SegmentC,
the link to use is NextUUID
between SegmentA and SegmentB, SegmentC is not part of the Linked Segment.¶
If SegmentB has a PrevUUID
to SegmentA but SegmentA has no NextUUID
, then the Matroska Player
MAY consider these two Segments linked as SegmentA followed by SegmentB.¶
As an example, three Segments
can be Hard Linked as a Linked Segment
through
cross-referencing each other with SegmentUUID
, PrevUUID
, and NextUUID
, as in this table:¶
file name |
SegmentUUID
|
PrevUUID
|
NextUUID
|
---|---|---|---|
start.mkv
|
71000c23cd310998 53fbc94dd984a5dd | Invalid | a77b3598941cb803 eac0fcdafe44fac9 |
middle.mkv
|
a77b3598941cb803 eac0fcdafe44fac9 | 71000c23cd310998 53fbc94dd984a5dd | 6c92285fa6d3e827 b198d120ea3ac674 |
end.mkv
|
6c92285fa6d3e827 b198d120ea3ac674 | a77b3598941cb803 eac0fcdafe44fac9 | Invalid |
An other example where only the NextUUID
Element is used:¶
file name |
SegmentUUID
|
PrevUUID
|
NextUUID
|
---|---|---|---|
start.mkv
|
71000c23cd310998 53fbc94dd984a5dd | Invalid | a77b3598941cb803 eac0fcdafe44fac9 |
middle.mkv
|
a77b3598941cb803 eac0fcdafe44fac9 | n/a | 6c92285fa6d3e827 b198d120ea3ac674 |
end.mkv
|
6c92285fa6d3e827 b198d120ea3ac674 | n/a | Invalid |
An example where only the PrevUUID
Element is used:¶
file name |
SegmentUUID
|
PrevUUID
|
NextUUID
|
---|---|---|---|
start.mkv
|
71000c23cd310998 53fbc94dd984a5dd | Invalid | n/a |
middle.mkv
|
a77b3598941cb803 eac0fcdafe44fac9 | 71000c23cd310998 53fbc94dd984a5dd | n/a |
end.mkv
|
6c92285fa6d3e827 b198d120ea3ac674 | a77b3598941cb803 eac0fcdafe44fac9 | Invalid |
In this example only the middle.mkv
is using the PrevUUID
and NextUUID
Elements:¶
file name |
SegmentUUID
|
PrevUUID
|
NextUUID
|
---|---|---|---|
start.mkv
|
71000c23cd310998 53fbc94dd984a5dd | Invalid | n/a |
middle.mkv
|
a77b3598941cb803 eac0fcdafe44fac9 | 71000c23cd310998 53fbc94dd984a5dd | 6c92285fa6d3e827 b198d120ea3ac674 |
end.mkv
|
6c92285fa6d3e827 b198d120ea3ac674 | n/a | Invalid |
Medium Linking creates relationships between Segments
using Ordered Chapters (Section 20.1.3) and the
ChapterSegmentUUID Element
. A Chapter Edition
with Ordered Chapters MAY contain
Chapter elements that reference timestamp ranges from other Segments
. The Segment
referenced by the Ordered Chapter via the ChapterSegmentUUID Element
SHOULD be played as
part of a Linked Segment.¶
The timestamps of Segment content referenced by Ordered Chapters MUST be adjusted according to the cumulative duration of the previous Ordered Chapters.¶
As an example a file named intro.mkv
could have a SegmentUUID
of "0xb16a58609fc7e60653a60c984fc11ead".
Another file called program.mkv
could use a Chapter Edition that contains two Ordered Chapters.
The first chapter references the Segment
of intro.mkv
with the use of a ChapterSegmentUUID
,
ChapterSegmentEditionUID
, ChapterTimeStart
, and optionally a ChapterTimeEnd
element.
The second chapter references content within the Segment
of program.mkv
. A Matroska Player
SHOULD recognize the Linked Segment
created by the use of ChapterSegmentUUID
in an enabled
Edition
and present the reference content of the two Segments
as a single presentation.¶
The ChapterSegmentUUID
represents the Segment that holds the content to play in place of the Linked Chapter
.
The ChapterSegmentUUID
MUST NOT be the SegmentUUID
of its own Segment
.¶
There are 2 ways to use a chapter link:¶
A Matroska Player
MUST play the content of the linked Segment
from the ChapterTimeStart
until ChapterTimeEnd
timestamp in place of the Linked Chapter
.¶
ChapterTimeStart
and ChapterTimeEnd
represent timestamps in the Linked Segment matching the value of ChapterSegmentUUID
.
Their values MUST be in the range of the linked Segment duration.¶
The ChapterTimeEnd
value MUST be set when using linked-duration chapter linking.
ChapterSegmentEditionUID
MUST NOT be set.¶
A Matroska Player
MUST play the whole linked Edition
of the linked Segment in place of the Linked Chapter
.¶
ChapterSegmentEditionUID
represents a valid Edition from the Linked Segment matching the value of ChapterSegmentUUID
.¶
When using linked-edition chapter linking. ChapterTimeEnd
is OPTIONAL.¶
The "default track" flag is a hint for a Matroska Player
indicating that a given track
SHOULD be eligible to be automatically selected as the default track for a given
language. If no tracks in a given language have the default track flag set, then all tracks
in that language are eligible for automatic selection. This can be used to indicate that
a track provides "regular service" suitable for users with default settings, as opposed to
specialized services, such as commentary, hearing-impaired captions, or descriptive audio.¶
The Matroska Player
MAY override the "default track" flag for any reason, including
user preferences to prefer tracks providing accessibility services.¶
The "forced" flag tells the Matroska Player
that it SHOULD display this subtitle track,
even if user preferences usually would not call for any subtitles to be displayed alongside
the current selected audio track. This can be used to indicate that a track contains translations
of onscreen text, or of dialogue spoken in a different language than the track's primary one.¶
The "hearing impaired" flag tells the Matroska Player
that it SHOULD prefer this track
when selecting a default track for a hearing-impaired user, and that it MAY prefer to select
a different track when selecting a default track for a non-hearing-impaired user.¶
The "visual impaired" flag tells the Matroska Player
that it SHOULD prefer this track
when selecting a default track for a visually-impaired user, and that it MAY prefer to select
a different track when selecting a default track for a non-visually-impaired user.¶
The "descriptions" flag tells the Matroska Player
that this track is suitable to play via
a text-to-speech system for a visually-impaired user, and that it SHOULD NOT automatically
select this track when selecting a default track for a non-visually-impaired user.¶
The "original" flag tells the Matroska Player
that this track is in the original language,
and that it SHOULD prefer it if configured to prefer original-language tracks of this
track's type.¶
The "commentary" flag tells the Matroska Player
that this track contains commentary on
the content.¶
TrackOperation
allows combining multiple tracks to make a virtual one. It uses
two separate system to combine tracks. One to create a 3D "composition" (left/right/background planes)
and one to simplify join two tracks together to make a single track.¶
A track created with TrackOperation
is a proper track with a UID and all its flags.
However, the codec ID is meaningless because each "sub" track needs to be decoded by its
own decoder before the "operation" is applied. The Cues Elements
corresponding to such
a virtual track SHOULD be the union of the Cues Elements
for each of the tracks it's composed of (when the Cues
are defined per track).¶
In the case of TrackJoinBlocks
, the Block Elements
(from BlockGroup
and SimpleBlock
)
of all the tracks SHOULD be used as if they were defined for this new virtual Track
.
When two Block Elements
have overlapping start or end timestamps, it's up to the underlying
system to either drop some of these frames or render them the way they overlap.
This situation SHOULD be avoided when creating such tracks as you can never be sure
of the end result on different platforms.¶
Overlay tracks SHOULD be rendered in the same channel as the track it's linked to. When content is found in such a track, it SHOULD be played on the rendering channel instead of the original track.¶
There are two different ways to compress 3D videos: have each eye track in a separate track and have one track have both eyes combined inside (which is more efficient, compression-wise). Matroska supports both ways.¶
For the single track variant, there is the StereoMode Element
, which defines how planes are
assembled in the track (mono or left-right combined). Odd values of StereoMode means the left
plane comes first for more convenient reading. The pixel count of the track (PixelWidth
/PixelHeight
)
is the raw amount of pixels, for example 3840x1080 for full HD side by side, and the DisplayWidth
/DisplayHeight
in pixels is the amount of pixels for one plane (1920x1080 for that full HD stream).
Old stereo 3D were displayed using anaglyph (cyan and red colors separated).
For compatibility with such movies, there is a value of the StereoMode that corresponds to AnaGlyph.¶
There is also a "packed" mode (values 13 and 14) which consists of packing two frames together
in a Block
using lacing. The first frame is the left eye and the other frame is the right eye
(or vice versa). The frames SHOULD be decoded in that order and are possibly dependent
on each other (P and B frames).¶
For separate tracks, Matroska needs to define exactly which track does what.
TrackOperation
with TrackCombinePlanes
do that. For more details look at
Section 18.8 on how TrackOperation works.¶
The 3D support is still in infancy and may evolve to support more features.¶
The StereoMode used to be part of Matroska v2 but it didn't meet the requirement
for multiple tracks. There was also a bug in libmatroska prior to 0.9.0 that would save/read
it as 0x53B9
instead of 0x53B8
; see OldStereoMode (Section 5.1.4.1.28.5). Matroska Readers
MAY support these legacy files by checking
Matroska v2 or 0x53B9
.
The older values of StereoMode were 0: mono, 1: right eye, 2: left eye, 3: both eyes, the only values that can be found in OldStereoMode.
They are not compatible with the StereoMode values found in Matroska v3 and above.¶
This section provides some example sets of Tracks and hypothetical user settings, along with
indications of which ones a similarly-configured Matroska Player
SHOULD automatically
select for playback by default in such a situation. A player MAY provide additional settings
with more detailed controls for more nuanced scenarios. These examples are provided as guidelines
to illustrate the intended usages of the various supported Track flags, and their expected behaviors.¶
Track names are shown in English for illustrative purposes; actual files may have titles in the language of each track, or provide titles in multiple languages.¶
Example track set:¶
No. | Type | Lang | Layout | Original | Default | Other flags | Name |
---|---|---|---|---|---|---|---|
1 | Video | und | N/A | N/A | N/A | None | |
2 | Audio | eng | 5.1 | 1 | 1 | None | |
3 | Audio | eng | 2.0 | 1 | 1 | None | |
4 | Audio | eng | 2.0 | 1 | 0 | Visual-impaired | Descriptive audio |
5 | Audio | esp | 5.1 | 0 | 1 | None | |
6 | Audio | esp | 2.0 | 0 | 0 | Visual-impaired | Descriptive audio |
7 | Audio | eng | 2.0 | 1 | 0 | Commentary | Director's Commentary |
8 | Audio | eng | 2.0 | 1 | 0 | None | Karaoke |
Here we have a file with 7 audio tracks, of which 5 are in English and 2 are in Spanish.¶
The English tracks all have the Original flag, indicating that English is the original content language.¶
Generally the player will first consider the track languages: if the player has an option to prefer original-language audio and the user has enabled it, then it should prefer one of the Original-flagged tracks. If configured to specifically prefer audio tracks in English or Spanish, the player should select one of the tracks in the corresponding language. The player may also wish to prefer an Original-flagged track if no tracks matching any of the user's explicitly-preferred languages are available.¶
Two of the tracks have the Visual-impaired flag. If the player has been configured to prefer such tracks, it should select one; otherwise, it should avoid them if possible.¶
If selecting an English track, when other settings have left multiple possible options, it may be useful to exclude the tracks that lack the Default flag: here, one provides descriptive service for the visually impaired (which has its own flag and may be automatically selected by user configuration, but is unsuitable for users with default-configured players), one is a commentary track (which has its own flag, which the player may or may not have specialized handling for), and the last contains karaoke versions of the music that plays during the film, which is an unusual specialized audio service that Matroska has no built-in support for indicating, so it's indicated in the track name instead. By not setting the Default flag on these specialized tracks, the file's author hints that they should not be automatically selected by a default-configured player.¶
Having narrowed its choices down, our example player now may have to select between tracks 2 and 3. The only difference between these tracks is their channel layouts: 2 is 5.1 surround, while 3 is stereo. If the player is aware that the output device is a pair of headphones or stereo speakers, it may wish to prefer the stereo mix automatically. On the other hand, if it knows that the device is a surround system, it may wish to prefer the surround mix.¶
If the player finishes analyzing all of the available audio tracks and finds that multiple seems equally and maximally preferable, it SHOULD default to the first of the group.¶
Example track set:¶
No. | Type | Lang | Original | Default | Forced | Other flags | Name |
---|---|---|---|---|---|---|---|
1 | Video | und | N/A | N/A | N/A | None | |
2 | Audio | fra | 1 | 1 | N/A | None | |
3 | Audio | por | 0 | 1 | N/A | None | |
4 | Subtitles | fra | 1 | 1 | 0 | None | |
5 | Subtitles | fra | 1 | 0 | 0 | Hearing-impaired | Captions for the hearing-impaired |
6 | Subtitles | por | 0 | 1 | 0 | None | |
7 | Subtitles | por | 0 | 0 | 1 | None | Signs |
8 | Subtitles | por | 0 | 0 | 0 | Hearing-impaired | SDH |
Here we have 2 audio tracks and 5 subtitle tracks. As we can see, French is the original language.¶
We'll start by discussing the case where the user prefers French (or Original-language) audio (or has explicitly selected the French audio track), and also prefers French subtitles.¶
In this case, if the player isn't configured to display captions when the audio matches their preferred subtitle languages, the player doesn't need to select a subtitle track at all.¶
If the user has indicated that they want captions to be displayed, the selection simply comes down to whether Hearing-impaired subtitles are preferred.¶
The situation for a user who prefers Portuguese subtitles starts out somewhat analogous. If they select the original French audio (either by explicit audio language preference, preference for Original-language tracks, or by explicitly selecting that track), then the selection once again comes down to the hearing-impaired preference.¶
However, the case where the Portuguese audio track is selected has an important catch: a Forced track in Portuguese is present. This may contain translations of onscreen text from the video track, or of portions of the audio that are not translated (music, for instance). This means that even if the user's preferences wouldn't normally call for captions here, the Forced track should be selected nonetheless, rather than selecting no track at all. On the other hand, if the user's preferences do call for captions, the non-Forced tracks should be preferred, as the Forced track will not contain captioning for the dialogue.¶
The Matroska Chapters system can have multiple Editions
and each Edition
can consist of
Simple Chapters
where a chapter start time is used as marker in the timeline only. An
Edition
can be more complex with Ordered Chapters
where a chapter end time stamp is additionally
used or much more complex with Linked Chapters
. The Matroska Chapters system can also have a menu
structure, borrowed from the DVD menu system [DVD-Video], or have its own built-in Matroska menu structure.¶
The EditionEntry
is also called an Edition
.
An Edition
contains a set of Edition
flags and MUST contain at least one ChapterAtom Element
.
Chapters are always inside an Edition
(or a Chapter itself part of an Edition
).
Multiple Editions are allowed. Some of these Editions MAY be ordered and others not.¶
Only one Edition
SHOULD have an EditionFlagDefault
flag set to true
.¶
The Default Edition
is the Edition
that a Matroska Player
SHOULD use for playback by default.¶
The first Edition
with the EditionFlagDefault
flag set to true
is the Default Edition
.¶
When all EditionFlagDefault
flags are set to false
, then the first Edition
is the Default Edition
.¶
Edition | FlagDefault | Default Edition |
---|---|---|
Edition 1 | true | X |
Edition 2 | true | |
Edition 3 | true |
Edition | FlagDefault | Default Edition |
---|---|---|
Edition 1 | false | X |
Edition 2 | false | |
Edition 3 | false |
Edition | FlagDefault | Default Edition |
---|---|---|
Edition 1 | false | |
Edition 2 | true | X |
Edition 3 | false |
The EditionFlagOrdered Flag
is a significant feature as it enables an Edition
of Ordered Chapters
which defines and arranges a virtual timeline rather than simply
labeling points within the timeline. For example, with Editions
of Ordered Chapters
a single Matroska file
can present multiple edits of a film without duplicating content.
Alternatively, if a videotape is digitized in full, one Ordered Edition
could present
the full content (including colorbars, countdown, slate, a feature presentation, and
black frames), while another Edition
of Ordered Chapters
can use Chapters
that only
mark the intended presentation with the colorbars and other ancillary visual information
excluded. If an Edition
of Ordered Chapters
is enabled, then the Matroska Player
MUST
play those Chapters in their stored order from the timestamp marked in the
ChapterTimeStart Element
to the timestamp marked in to ChapterTimeEnd Element
.¶
If the EditionFlagOrdered Flag
evaluates to "0", Simple Chapters
are used and
only the ChapterTimeStart
of a Chapter
is used as chapter mark to jump to the
predefined point in the timeline. With Simple Chapters
, a Matroska Player
MUST
ignore certain Chapter Elements
. In that case these elements are informational only.¶
The following list shows the different Chapter elements only found in Ordered Chapters
.¶
Ordered Chapter elements |
---|
ChapterAtom/ChapterSegmentUUID |
ChapterAtom/ChapterSegmentEditionUID |
ChapterAtom/ChapterTrack |
ChapterAtom/ChapProcess |
Info/ChapterTranslate |
TrackEntry/TrackTranslate |
Furthermore, there are other EBML Elements
which could be used if the EditionFlagOrdered
evaluates to "1".¶
Hard Linking: Ordered-Chapters
supersedes the Hard Linking
.¶
Medium Linking: Ordered Chapters
are used in a normal way and can be combined
with the ChapterSegmentUUID
element which establishes a link to another Segment.¶
See Section 17 on the Linked Segments for more information
about Hard Linking
and Medium Linking
.¶
The ChapterAtom
is also called a Chapter
.¶
The timestamp of the start of Chapter
with nanosecond accuracy, not scaled by TimestampScale.
For Simple Chapters
this is the position of the chapter markers in the timeline.¶
The timestamp of the end of Chapter
with nanosecond accuracy, not scaled by TimestampScale.
The timestamp defined by the ChapterTimeEnd
is not part of the Chapter
.
A Matroska Player
calculates the duration of this Chapter
using the difference between the
ChapterTimeEnd
and ChapterTimeStart
.
The end timestamp MUST be greater than or equal to the start timestamp.¶
When the ChapterTimeEnd
timestamp is equal to the ChapterTimeStart
timestamp,
the timestamps is included in the Chapter
. It can be useful to put markers in
a file or add chapter commands with ordered chapter commands without having to play anything;
see Section 5.1.7.1.4.14.¶
Chapter | Start timestamp | End timestamp | Duration |
---|---|---|---|
Chapter 1 | 0 | 1000000000 | 1000000000 |
Chapter 2 | 1000000000 | 5000000000 | 4000000000 |
Chapter 3 | 6000000000 | 6000000000 | 0 |
Chapter 4 | 9000000000 | 8000000000 | Invalid (-1000000000) |
A ChapterAtom
element can contain other ChapterAtom
elements.
That element is a Parent Chapter
and the ChapterAtom
elements it contains are Nested Chapters
.¶
Nested Chapters can be useful to tag small parts of a Segment that already have tags or add Chapter Codec commands on smaller parts of a Segment that already have Chapter Codec commands.¶
The ChapterTimeStart
of a Nested Chapter
MUST be greater than or equal to the ChapterTimeStart
its Parent Chapter
.¶
If the Parent Chapter
of a Nested Chapter
has a ChapterTimeEnd
, the ChapterTimeStart
of that Nested Chapter
MUST be smaller than or equal to the ChapterTimeEnd
of the Parent Chapter
.¶
The ChapterTimeEnd
of the lowest level of Nested Chapters
MUST be set for Ordered Chapters.¶
When used with Ordered Chapters, the ChapterTimeEnd
value of a Parent Chapter
is useless for playback
as the proper playback sections are described in its Nested Chapters
.
The ChapterTimeEnd
SHOULD NOT be set in Parent Chapters
and MUST be ignored for playback.¶
Each level can have different meanings for audio and video. The ORIGINAL_MEDIA_TYPE
tag [MatroskaTags] can be used to
specify a string for ChapterPhysicalEquiv = 60. Here is the list of possible levels for both audio and video:¶
Value | Audio | Video | Comment |
---|---|---|---|
70 | SET / PACKAGE | SET / PACKAGE | the collection of different media |
60 | CD / 12" / 10" / 7" / TAPE / MINIDISC / DAT | DVD / VHS / LASERDISC | the physical medium like a CD or a DVD |
50 | SIDE | SIDE | when the original medium (LP/DVD) has different sides |
40 | - | LAYER | another physical level on DVDs |
30 | SESSION | SESSION | as found on CDs and DVDs |
20 | TRACK | - | as found on audio CDs |
10 | INDEX | - | the first logical level of the side/medium |
In this example a movie is split in different chapters. It could also just be an audio file (album) on which each track corresponds to a chapter.¶
This would translate in the following matroska form, with the EBML tree shown as XML :¶
In this example an (existing) album is split into different chapters, and one of them contains another splitting.¶
00:00 - 12:28 : Baby Wants To Bleep/Rock¶
This would translate in the following matroska form, with the EBML tree shown as XML :¶
Matroska supports storage of related files and data in the Attachments Element
(a Top-Level Element
). Attachment Elements
can be used to store related cover art,
font files, transcripts, reports, error recovery files, picture, or text-based annotations,
copies of specifications, or other ancillary files related to the Segment
.¶
Matroska Readers
MUST NOT execute files stored as Attachment Elements
.¶
This section defines a set of guidelines for the storage of cover art in Matroska files.
A Matroska Reader
MAY use embedded cover art to display a representational
still-image depiction of the multimedia contents of the Matroska file.¶
Only [JPEG] and PNG [RFC2083] image formats SHOULD be used for cover art pictures.¶
There can be two different covers for a movie/album: a portrait style (e.g., a DVD case) and a landscape style (e.g., a wide banner ad).¶
There can be two versions of the same cover, the normal cover
and the small cover
.
The dimension of the normal cover
SHOULD be 600 pixels on the smallest side -- for example,
960x600 for landscape, 600x800 for portrait, or 600x600 for square. The dimension of
the small cover
SHOULD be 120 pixels on the smallest side -- for example, 192x120 or 120x160.¶
Versions of cover art can be differentiated by the filename, which is stored in the
FileName Element
. The default filename of the normal cover
in square or portrait mode
is cover.(jpg|png)
. When stored, the normal cover
SHOULD be the first Attachment in
storage order. The small cover
SHOULD be prefixed with "small_", such as
small_cover.(jpg|png)
. The landscape variant SHOULD be suffixed with "_land",
such as cover_land.(jpg|png)
. The filenames are case-sensitive.¶
The following table provides examples of file names for cover art in Attachments.¶
FileName | Image Orientation | Pixel Length of Smallest Side |
---|---|---|
cover.jpg | Portrait or square | 600 |
small_cover.png | Portrait or square | 120 |
cover_land.png | Landscape | 600 |
small_cover_land.jpg | Landscape | 120 |
Font files MAY be added to a Matroska file as Attachments so that the font file may be used to display an associated subtitle track. This allows the presentation of a Matroska file to be consistent in various environments where the needed fonts might not be available on the local system.¶
Depending on the font format in question, each font file can contain multiple font variants.
Each font variant has a name which will be referred to as Font Name from now on.
This Font Name can be different from the Attachment's FileName
, even when disregarding the extension.
In order to select a font for display, a Matroska player SHOULD consider both the Font Name
and the base name of the Attachment's FileName, preferring the former when there are multiple matches.¶
Subtitle codecs, such as SubStation Alpha (SSA/ASS), usually refer to a font by its Font Name, not by its filename. If none of the Attachments are a match for the Font Name, the Matroska player SHOULD attempt to find a system font whose Font Name matches the one used in the subtitle track.¶
Since loading fonts temporarily can take a while, a Matroska player usually loads or installs all the fonts found in attachments so they are ready to be used during playback. Failure to use the font attachment might result in incorrect rendering of the subtitles.¶
If a selected subtitle track has some AttachmentLink
elements, the player MAY restrict its font rendering to use only these fonts.¶
A Matroska player SHOULD handle the official font media types from [RFC8081] when the system can handle the type:¶
font/sfnt
: Generic SFNT Font Type,¶
font/ttf
: TTF Font Type,¶
font/otf
: OpenType Layout (OTF) Font Type,¶
font/collection
: Collection Font Type,¶
font/woff
: WOFF 1.0,¶
font/woff2
: WOFF 2.0.¶
Fonts in Matroska existed long before [RFC8081]. A few unofficial media types for fonts were used in existing files. Therefore, it is RECOMMENDED for a Matroska player to support the following legacy media types for font attachments:¶
application/x-truetype-font
: Truetype fonts, equivalent to font/ttf
and sometimes font/otf
,¶
application/x-font-ttf
: TTF fonts, equivalent to font/ttf
,¶
application/vnd.ms-opentype
: OpenType Layout fonts, equivalent to font/otf
¶
application/font-sfnt
: Generic SFNT Font Type, equivalent to font/sfnt
¶
application/font-woff
: WOFF 1.0, equivalent to font/woff
¶
There may also be some font attachments with the application/octet-stream
media type.
In that case the Matroska player MAY try to guess the font type by checking the file extension of the AttachedFile\FileName
string.
Common file extensions for fonts are:¶
.ttf
for Truetype fonts, equivalent to font/ttf
,¶
.otf
for OpenType Layout fonts, equivalent to font/otf
,¶
.ttc
for Collection fonts, equivalent to font/collection
¶
The file extension check MUST be case-insensitive.¶
Matroska writers SHOULD use a valid font media type from [RFC8081] in the AttachedFile\FileMediaType
of the font attachment.
They MAY use the media types found in older files when compatibility with older players is necessary.¶
The Cues Element
provides an index of certain Cluster Elements
to allow for optimized
seeking to absolute timestamps within the Segment
. The Cues Element
contains one or
many CuePoint Elements
which each MUST reference an absolute timestamp (via the
CueTime Element
), a Track
(via the CueTrack Element
), and a Segment Position
(via the CueClusterPosition Element
). Additional non-mandated Elements are part of
the CuePoint Element
such as CueDuration
, CueRelativePosition
, CueCodecState
and others which provide any Matroska Reader
with additional information to use in
the optimization of seeking performance.¶
The following recommendations are provided to optimize Matroska performance.¶
Unless Matroska is used as a live stream, it SHOULD contain a Cues Element
.¶
For each video track, each keyframe SHOULD be referenced by a CuePoint Element
.¶
It is RECOMMENDED to not reference non-keyframes of video tracks in Cues
unless
it references a Cluster Element
which contains a CodecState Element
but no keyframes.¶
For each subtitle track present, each subtitle frame SHOULD be referenced by a
CuePoint Element
with a CueDuration Element
.¶
References to audio tracks MAY be skipped in CuePoint Elements
if a video track
is present. When included the CuePoint Elements
SHOULD reference audio keyframes
at most once every 500 milliseconds.¶
If the referenced frame is not stored within the first SimpleBlock
, or first
BlockGroup
within its Cluster Element
, then the CueRelativePosition Element
SHOULD be written to reference where in the Cluster
the reference frame is stored.¶
If a CuePoint Element
references Cluster Element
that includes a CodecState Element
,
then that CuePoint Element
MUST use a CueCodecState Element
.¶
CuePoint Elements
SHOULD be numerically sorted in storage order by the value of the CueTime Element
.¶
In Matroska, there are two kinds of streaming: file access and livestreaming.¶
File access can simply be reading a file located on your computer, but also includes
accessing a file from an HTTP (web) server or CIFS (Windows share) server. These protocols
are usually safe from reading errors and seeking in the stream is possible. However,
when a file is stored far away or on a slow server, seeking can be an expensive operation
and should be avoided. The guidelines in Section 25, when followed, help reduce the number
of seeking operations for regular playback and also have the playback start quickly without
a lot of data needed to read first (like a Cues Element
, Attachment Element
or SeekHead Element
).¶
Matroska, having a small overhead, is well suited for storing music/videos on file servers without a big impact on the bandwidth used. Matroska does not require the index to be loaded before playing, which allows playback to start very quickly. The index can be loaded only when seeking is requested the first time.¶
Livestreaming is the equivalent of television broadcasting on the internet. There are 2 families of servers for livestreaming: RTP/RTSP and HTTP. Matroska is not meant to be used over RTP. RTP already has timing and channel mechanisms that would be wasted if doubled in Matroska. Additionally, having the same information at the RTP and Matroska level would be a source of confusion if they do not match. Livestreaming of Matroska over file-like protocols like HTTP, QUIC, etc. is possible.¶
A live Matroska stream is different from a file because it usually has no known end
(only ending when the client disconnects). For this, all bits of the "size" portion
of the Segment Element
MUST be set to 1. Another option is to concatenate Segment Elements
with known sizes, one after the other. This solution allows a change of codec/resolution
between each segment. For example, this allows for a switch between 4:3 and 16:9 in a television program.¶
When Segment Elements
are continuous, certain Elements
, like SeekHead
, Cues
,
Chapters
, and Attachments
, MUST NOT be used.¶
It is possible for a Matroska Player
to detect that a stream is not seekable.
If the stream has neither a SeekHead
list nor a Cues
list at the beginning of the stream,
it SHOULD be considered non-seekable. Even though it is possible to seek forward
in the stream, it is NOT RECOMMENDED.¶
In the context of live radio or web TV, it is possible to "tag" the content while it is
playing. The Tags Element
can be placed between Clusters
each time it is necessary.
In that case, the new Tags Element
MUST reset the previously encountered Tags Elements
and use the new values instead.¶
It is RECOMMENDED that each individual Cluster Element
contains no more than
5 seconds or 5 megabytes of content.¶
It is RECOMMENDED that the first SeekHead Element
be followed by a Void Element
to
allow for the SeekHead Element
to be expanded to cover new Top-Level Elements
that could be added to the Matroska file, such as Tags
, Chapters
, and Attachments
Elements.¶
The size of this Void Element
should be adjusted depending on the Matroska file already having
Tags
, Chapters
, and Attachments
Elements.¶
While there can be Top-Level Elements
in any order, some ordering of Elements are better than others.
Here are few optimum layouts for different use case:¶
This is the basic layout muxers should be using for an efficient playback experience.¶
Cues are usually a big chunk of data referencing a lot of locations in the file. For players that want to seek in the file they need to seek to the end of the file to access these locations. It is often better if they are placed early in the file. On the other hand that means players that don't intend to seek will have to read/skip these data no matter what.¶
Because the Cues reference locations further in the file, it's often complicated to allocate the proper space for that element before all the locations are known. Therefore, this layout is rarely used.¶
In Livestreaming (Section 23.2) only a few elements make sense. SeekHead and Cues are useless for example. All elements other than the Clusters MUST be placed before the Clusters.¶
Matroska inherits security considerations from EBML.¶
Attacks on a Matroska Reader
could include:¶
Storage of an arbitrary and potentially executable data within an Attachment Element
.
Matroska Readers
that extract or use data from Matroska Attachments SHOULD
check that the data adheres to expectations or not use the attachement.¶
A Matroska Attachment
with an inaccurate media type.¶
Damage to the Encryption and Compression fields (Section 14) that would result in bogus binary data interpreted by the decoder.¶
Chapter Codecs running unwanted commands on the host system.¶
The same error handling done for EBML applies to Matroska files.
Particular error handling is not covered in this specification as this is depends on the goal of the Matroska Readers
.
It is up to the decision of the Matroska Readers
on how to handle the errors if they are recoverable in their code or not.
For example, if the checksum of the \Segment\Tracks
is invalid some could decide to try to read the data anyway,
some will just reject the file, most will not even check it.¶
Matroska Reader
implementations need to be robust against malicious payloads.
Those related to denial of service are outlined in Section 2.1 of [RFC4732].
Although rarer, the same may apply to a Matroska Writer
. Malicious stream data
must not cause the Writer to misbehave, as this might allow an attacker access
to transcoding gateways.¶
As an audio and visual container format, a Matroska file or stream will potentially encapsulate numerous byte streams created with a variety of codecs. Implementers will need to consider the security considerations of these encapsulated formats.¶
This document creates a new IANA registry called the "Matroska Element IDs" registry.¶
To register a new Element ID in this registry, one needs an Element ID, a Change Controller (IETF or email of registrant) and an optional Reference to a document describing the Element ID.¶
Element IDs are encoded using the VINT mechanism described in Section 4 of [RFC8794] and can be between one and five octets long. Five-octet-long Element IDs are possible only if declared in the EBML header.¶
Element IDs are described in Section 5 of [RFC8794] with errata 7189 and 7191.¶
One-octet Matroska Element IDs are to be allocated according to the "RFC Required" policy [RFC8126].¶
Two-octet Matroska Element IDs are to be allocated according to the "Specification Required" policy [RFC8126].¶
Three-octet and four-octet Matroska Element IDs are to be allocated according to the "First Come First Served" policy [RFC8126].¶
The allowed values in the Elements IDs registry are similar to the ones found in the EBML Element IDs registry defined in Section 17.1 of [RFC8794].¶
EBML IDs defined for the EBML Header -- as defined in Section 17.1 of [RFC8794] -- MUST NOT be used as Matroska Element IDs.¶
Given the scarcity of the One-octet Element IDs, they should only be created to save space for elements found many times in a file. For example, within a BlockGroup or Chapters. The Four-octet Element IDs are mostly for synchronization of large elements. They should only be used for such high level elements. Elements that are not expected to be used often should use Three-octet Element IDs.¶
Elements found in Section 28 have an assigned Matroska Element ID for historical reasons.
These elements are not in use and SHOULD NOT be reused unless there is no other IDs available with the desired size.
Such IDs are considered as reclaimed
to the IANA registry as they could be used for other things in the future.¶
Matroska Element IDs Values found in this document are assigned as initial values as follows:¶
Element ID | Element Name | Reference |
---|---|---|
0x80 | ChapterDisplay | Described in Section 5.1.7.1.4.9 |
0x83 | TrackType | Described in Section 5.1.4.1.3 |
0x85 | ChapString | Described in Section 5.1.7.1.4.10 |
0x86 | CodecID | Described in Section 5.1.4.1.21 |
0x88 | FlagDefault | Described in Section 5.1.4.1.5 |
0x8E | Slices | Reclaimed (Section 28.5) |
0x91 | ChapterTimeStart | Described in Section 5.1.7.1.4.3 |
0x92 | ChapterTimeEnd | Described in Section 5.1.7.1.4.4 |
0x96 | CueRefTime | Described in Section 5.1.5.1.2.8 |
0x97 | CueRefCluster | Reclaimed (Section 28.37) |
0x98 | ChapterFlagHidden | Described in Section 5.1.7.1.4.5 |
0x9A | FlagInterlaced | Described in Section 5.1.4.1.28.1 |
0x9B | BlockDuration | Described in Section 5.1.3.5.3 |
0x9C | FlagLacing | Described in Section 5.1.4.1.12 |
0x9D | FieldOrder | Described in Section 5.1.4.1.28.2 |
0x9F | Channels | Described in Section 5.1.4.1.29.3 |
0xA0 | BlockGroup | Described in Section 5.1.3.5 |
0xA1 | Block | Described in Section 5.1.3.5.1 |
0xA2 | BlockVirtual | Reclaimed (Section 28.3) |
0xA3 | SimpleBlock | Described in Section 5.1.3.4 |
0xA4 | CodecState | Described in Section 5.1.3.5.6 |
0xA5 | BlockAdditional | Described in Section 5.1.3.5.2.2 |
0xA6 | BlockMore | Described in Section 5.1.3.5.2.1 |
0xA7 | Position | Described in Section 5.1.3.2 |
0xAA | CodecDecodeAll | Reclaimed (Section 28.22) |
0xAB | PrevSize | Described in Section 5.1.3.3 |
0xAE | TrackEntry | Described in Section 5.1.4.1 |
0xAF | EncryptedBlock | Reclaimed (Section 28.15) |
0xB0 | PixelWidth | Described in Section 5.1.4.1.28.6 |
0xB2 | CueDuration | Described in Section 5.1.5.1.2.4 |
0xB3 | CueTime | Described in Section 5.1.5.1.1 |
0xB5 | SamplingFrequency | Described in Section 5.1.4.1.29.1 |
0xB6 | ChapterAtom | Described in Section 5.1.7.1.4 |
0xB7 | CueTrackPositions | Described in Section 5.1.5.1.2 |
0xB9 | FlagEnabled | Described in Section 5.1.4.1.4 |
0xBA | PixelHeight | Described in Section 5.1.4.1.28.7 |
0xBB | CuePoint | Described in Section 5.1.5.1 |
0xC0 | TrickTrackUID | Reclaimed (Section 28.28) |
0xC1 | TrickTrackSegmentUID | Reclaimed (Section 28.29) |
0xC4 | TrickMasterTrackSegmentUID | Reclaimed (Section 28.32) |
0xC6 | TrickTrackFlag | Reclaimed (Section 28.30) |
0xC7 | TrickMasterTrackUID | Reclaimed (Section 28.31) |
0xC8 | ReferenceFrame | Reclaimed (Section 28.12) |
0xC9 | ReferenceOffset | Reclaimed (Section 28.13) |
0xCA | ReferenceTimestamp | Reclaimed (Section 28.14) |
0xCB | BlockAdditionID | Reclaimed (Section 28.9) |
0xCC | LaceNumber | Reclaimed (Section 28.7) |
0xCD | FrameNumber | Reclaimed (Section 28.8) |
0xCE | Delay | Reclaimed (Section 28.10) |
0xCF | SliceDuration | Reclaimed (Section 28.11) |
0xD7 | TrackNumber | Described in Section 5.1.4.1.1 |
0xDB | CueReference | Described in Section 5.1.5.1.2.7 |
0xE0 | Video | Described in Section 5.1.4.1.28 |
0xE1 | Audio | Described in Section 5.1.4.1.29 |
0xE2 | TrackOperation | Described in Section 5.1.4.1.30 |
0xE3 | TrackCombinePlanes | Described in Section 5.1.4.1.30.1 |
0xE4 | TrackPlane | Described in Section 5.1.4.1.30.2 |
0xE5 | TrackPlaneUID | Described in Section 5.1.4.1.30.3 |
0xE6 | TrackPlaneType | Described in Section 5.1.4.1.30.4 |
0xE7 | Timestamp | Described in Section 5.1.3.1 |
0xE8 | TimeSlice | Reclaimed (Section 28.6) |
0xE9 | TrackJoinBlocks | Described in Section 5.1.4.1.30.5 |
0xEA | CueCodecState | Described in Section 5.1.5.1.2.6 |
0xEB | CueRefCodecState | Reclaimed (Section 28.39) |
0xED | TrackJoinUID | Described in Section 5.1.4.1.30.6 |
0xEE | BlockAddID | Described in Section 5.1.3.5.2.3 |
0xF0 | CueRelativePosition | Described in Section 5.1.5.1.2.3 |
0xF1 | CueClusterPosition | Described in Section 5.1.5.1.2.2 |
0xF7 | CueTrack | Described in Section 5.1.5.1.2.1 |
0xFA | ReferencePriority | Described in Section 5.1.3.5.4 |
0xFB | ReferenceBlock | Described in Section 5.1.3.5.5 |
0xFD | ReferenceVirtual | Reclaimed (Section 28.4) |
0x41A4 | BlockAddIDName | Described in Section 5.1.4.1.17.2 |
0x41E4 | BlockAdditionMapping | Described in Section 5.1.4.1.17 |
0x41E7 | BlockAddIDType | Described in Section 5.1.4.1.17.3 |
0x41ED | BlockAddIDExtraData | Described in Section 5.1.4.1.17.4 |
0x41F0 | BlockAddIDValue | Described in Section 5.1.4.1.17.1 |
0x4254 | ContentCompAlgo | Described in Section 5.1.4.1.31.6 |
0x4255 | ContentCompSettings | Described in Section 5.1.4.1.31.7 |
0x437C | ChapLanguage | Described in Section 5.1.7.1.4.11 |
0x437D | ChapLanguageBCP47 | Described in Section 5.1.7.1.4.12 |
0x437E | ChapCountry | Described in Section 5.1.7.1.4.13 |
0x4444 | SegmentFamily | Described in Section 5.1.2.7 |
0x4461 | DateUTC | Described in Section 5.1.2.11 |
0x447A | TagLanguage | Described in Section 5.1.8.1.2.2 |
0x447B | TagLanguageBCP47 | Described in Section 5.1.8.1.2.3 |
0x4484 | TagDefault | Described in Section 5.1.8.1.2.4 |
0x4485 | TagBinary | Described in Section 5.1.8.1.2.6 |
0x4487 | TagString | Described in Section 5.1.8.1.2.5 |
0x4489 | Duration | Described in Section 5.1.2.10 |
0x44B4 | TagDefaultBogus | Reclaimed (Section 28.43) |
0x450D | ChapProcessPrivate | Described in Section 5.1.7.1.4.16 |
0x45A3 | TagName | Described in Section 5.1.8.1.2.1 |
0x45B9 | EditionEntry | Described in Section 5.1.7.1 |
0x45BC | EditionUID | Described in Section 5.1.7.1.1 |
0x45DB | EditionFlagDefault | Described in Section 5.1.7.1.2 |
0x45DD | EditionFlagOrdered | Described in Section 5.1.7.1.3 |
0x465C | FileData | Described in Section 5.1.6.1.4 |
0x4660 | FileMediaType | Described in Section 5.1.6.1.3 |
0x4661 | FileUsedStartTime | Reclaimed (Section 28.41) |
0x4662 | FileUsedEndTime | Reclaimed (Section 28.42) |
0x466E | FileName | Described in Section 5.1.6.1.2 |
0x4675 | FileReferral | Reclaimed (Section 28.40) |
0x467E | FileDescription | Described in Section 5.1.6.1.1 |
0x46AE | FileUID | Described in Section 5.1.6.1.5 |
0x47E1 | ContentEncAlgo | Described in Section 5.1.4.1.31.9 |
0x47E2 | ContentEncKeyID | Described in Section 5.1.4.1.31.10 |
0x47E3 | ContentSignature | Reclaimed (Section 28.33) |
0x47E4 | ContentSigKeyID | Reclaimed (Section 28.34) |
0x47E5 | ContentSigAlgo | Reclaimed (Section 28.35) |
0x47E6 | ContentSigHashAlgo | Reclaimed (Section 28.36) |
0x47E7 | ContentEncAESSettings | Described in Section 5.1.4.1.31.11 |
0x47E8 | AESSettingsCipherMode | Described in Section 5.1.4.1.31.12 |
0x4D80 | MuxingApp | Described in Section 5.1.2.13 |
0x4DBB | Seek | Described in Section 5.1.1.1 |
0x5031 | ContentEncodingOrder | Described in Section 5.1.4.1.31.2 |
0x5032 | ContentEncodingScope | Described in Section 5.1.4.1.31.3 |
0x5033 | ContentEncodingType | Described in Section 5.1.4.1.31.4 |
0x5034 | ContentCompression | Described in Section 5.1.4.1.31.5 |
0x5035 | ContentEncryption | Described in Section 5.1.4.1.31.8 |
0x535F | CueRefNumber | Reclaimed (Section 28.38) |
0x536E | Name | Described in Section 5.1.4.1.18 |
0x5378 | CueBlockNumber | Described in Section 5.1.5.1.2.5 |
0x537F | TrackOffset | Reclaimed (Section 28.18) |
0x53AB | SeekID | Described in Section 5.1.1.1.1 |
0x53AC | SeekPosition | Described in Section 5.1.1.1.2 |
0x53B8 | StereoMode | Described in Section 5.1.4.1.28.3 |
0x53B9 | OldStereoMode | Described in Section 5.1.4.1.28.5 |
0x53C0 | AlphaMode | Described in Section 5.1.4.1.28.4 |
0x54AA | PixelCropBottom | Described in Section 5.1.4.1.28.8 |
0x54B0 | DisplayWidth | Described in Section 5.1.4.1.28.12 |
0x54B2 | DisplayUnit | Described in Section 5.1.4.1.28.14 |
0x54B3 | AspectRatioType | Reclaimed (Section 28.24) |
0x54BA | DisplayHeight | Described in Section 5.1.4.1.28.13 |
0x54BB | PixelCropTop | Described in Section 5.1.4.1.28.9 |
0x54CC | PixelCropLeft | Described in Section 5.1.4.1.28.10 |
0x54DD | PixelCropRight | Described in Section 5.1.4.1.28.11 |
0x55AA | FlagForced | Described in Section 5.1.4.1.6 |
0x55AB | FlagHearingImpaired | Described in Section 5.1.4.1.7 |
0x55AC | FlagVisualImpaired | Described in Section 5.1.4.1.8 |
0x55AD | FlagTextDescriptions | Described in Section 5.1.4.1.9 |
0x55AE | FlagOriginal | Described in Section 5.1.4.1.10 |
0x55AF | FlagCommentary | Described in Section 5.1.4.1.11 |
0x55B0 | Colour | Described in Section 5.1.4.1.28.16 |
0x55B1 | MatrixCoefficients | Described in Section 5.1.4.1.28.17 |
0x55B2 | BitsPerChannel | Described in Section 5.1.4.1.28.18 |
0x55B3 | ChromaSubsamplingHorz | Described in Section 5.1.4.1.28.19 |
0x55B4 | ChromaSubsamplingVert | Described in Section 5.1.4.1.28.20 |
0x55B5 | CbSubsamplingHorz | Described in Section 5.1.4.1.28.21 |
0x55B6 | CbSubsamplingVert | Described in Section 5.1.4.1.28.22 |
0x55B7 | ChromaSitingHorz | Described in Section 5.1.4.1.28.23 |
0x55B8 | ChromaSitingVert | Described in Section 5.1.4.1.28.24 |
0x55B9 | Range | Described in Section 5.1.4.1.28.25 |
0x55BA | TransferCharacteristics | Described in Section 5.1.4.1.28.26 |
0x55BB | Primaries | Described in Section 5.1.4.1.28.27 |
0x55BC | MaxCLL | Described in Section 5.1.4.1.28.28 |
0x55BD | MaxFALL | Described in Section 5.1.4.1.28.29 |
0x55D0 | MasteringMetadata | Described in Section 5.1.4.1.28.30 |
0x55D1 | PrimaryRChromaticityX | Described in Section 5.1.4.1.28.31 |
0x55D2 | PrimaryRChromaticityY | Described in Section 5.1.4.1.28.32 |
0x55D3 | PrimaryGChromaticityX | Described in Section 5.1.4.1.28.33 |
0x55D4 | PrimaryGChromaticityY | Described in Section 5.1.4.1.28.34 |
0x55D5 | PrimaryBChromaticityX | Described in Section 5.1.4.1.28.35 |
0x55D6 | PrimaryBChromaticityY | Described in Section 5.1.4.1.28.36 |
0x55D7 | WhitePointChromaticityX | Described in Section 5.1.4.1.28.37 |
0x55D8 | WhitePointChromaticityY | Described in Section 5.1.4.1.28.38 |
0x55D9 | LuminanceMax | Described in Section 5.1.4.1.28.39 |
0x55DA | LuminanceMin | Described in Section 5.1.4.1.28.40 |
0x55EE | MaxBlockAdditionID | Described in Section 5.1.4.1.16 |
0x5654 | ChapterStringUID | Described in Section 5.1.7.1.4.2 |
0x56AA | CodecDelay | Described in Section 5.1.4.1.25 |
0x56BB | SeekPreRoll | Described in Section 5.1.4.1.26 |
0x5741 | WritingApp | Described in Section 5.1.2.14 |
0x5854 | SilentTracks | Reclaimed (Section 28.1) |
0x58D7 | SilentTrackNumber | Reclaimed (Section 28.2) |
0x61A7 | AttachedFile | Described in Section 5.1.6.1 |
0x6240 | ContentEncoding | Described in Section 5.1.4.1.31.1 |
0x6264 | BitDepth | Described in Section 5.1.4.1.29.4 |
0x63A2 | CodecPrivate | Described in Section 5.1.4.1.22 |
0x63C0 | Targets | Described in Section 5.1.8.1.1 |
0x63C3 | ChapterPhysicalEquiv | Described in Section 5.1.7.1.4.8 |
0x63C4 | TagChapterUID | Described in Section 5.1.8.1.1.5 |
0x63C5 | TagTrackUID | Described in Section 5.1.8.1.1.3 |
0x63C6 | TagAttachmentUID | Described in Section 5.1.8.1.1.6 |
0x63C9 | TagEditionUID | Described in Section 5.1.8.1.1.4 |
0x63CA | TargetType | Described in Section 5.1.8.1.1.2 |
0x6624 | TrackTranslate | Described in Section 5.1.4.1.27 |
0x66A5 | TrackTranslateTrackID | Described in Section 5.1.4.1.27.1 |
0x66BF | TrackTranslateCodec | Described in Section 5.1.4.1.27.2 |
0x66FC | TrackTranslateEditionUID | Described in Section 5.1.4.1.27.3 |
0x67C8 | SimpleTag | Described in Section 5.1.8.1.2 |
0x68CA | TargetTypeValue | Described in Section 5.1.8.1.1.1 |
0x6911 | ChapProcessCommand | Described in Section 5.1.7.1.4.17 |
0x6922 | ChapProcessTime | Described in Section 5.1.7.1.4.18 |
0x6924 | ChapterTranslate | Described in Section 5.1.2.8 |
0x6933 | ChapProcessData | Described in Section 5.1.7.1.4.19 |
0x6944 | ChapProcess | Described in Section 5.1.7.1.4.14 |
0x6955 | ChapProcessCodecID | Described in Section 5.1.7.1.4.15 |
0x69A5 | ChapterTranslateID | Described in Section 5.1.2.8.1 |
0x69BF | ChapterTranslateCodec | Described in Section 5.1.2.8.2 |
0x69FC | ChapterTranslateEditionUID | Described in Section 5.1.2.8.3 |
0x6D80 | ContentEncodings | Described in Section 5.1.4.1.31 |
0x6DE7 | MinCache | Reclaimed (Section 28.16) |
0x6DF8 | MaxCache | Reclaimed (Section 28.17) |
0x6E67 | ChapterSegmentUUID | Described in Section 5.1.7.1.4.6 |
0x6EBC | ChapterSegmentEditionUID | Described in Section 5.1.7.1.4.7 |
0x6FAB | TrackOverlay | Reclaimed (Section 28.23) |
0x7373 | Tag | Described in Section 5.1.8.1 |
0x7384 | SegmentFilename | Described in Section 5.1.2.2 |
0x73A4 | SegmentUUID | Described in Section 5.1.2.1 |
0x73C4 | ChapterUID | Described in Section 5.1.7.1.4.1 |
0x73C5 | TrackUID | Described in Section 5.1.4.1.2 |
0x7446 | AttachmentLink | Described in Section 5.1.4.1.24 |
0x75A1 | BlockAdditions | Described in Section 5.1.3.5.2 |
0x75A2 | DiscardPadding | Described in Section 5.1.3.5.7 |
0x7670 | Projection | Described in Section 5.1.4.1.28.41 |
0x7671 | ProjectionType | Described in Section 5.1.4.1.28.42 |
0x7672 | ProjectionPrivate | Described in Section 5.1.4.1.28.43 |
0x7673 | ProjectionPoseYaw | Described in Section 5.1.4.1.28.44 |
0x7674 | ProjectionPosePitch | Described in Section 5.1.4.1.28.45 |
0x7675 | ProjectionPoseRoll | Described in Section 5.1.4.1.28.46 |
0x78B5 | OutputSamplingFrequency | Described in Section 5.1.4.1.29.2 |
0x7BA9 | Title | Described in Section 5.1.2.12 |
0x7D7B | ChannelPositions | Reclaimed (Section 28.27) |
0x22B59C | Language | Described in Section 5.1.4.1.19 |
0x22B59D | LanguageBCP47 | Described in Section 5.1.4.1.20 |
0x23314F | TrackTimestampScale | Described in Section 5.1.4.1.15 |
0x234E7A | DefaultDecodedFieldDuration | Described in Section 5.1.4.1.14 |
0x2383E3 | FrameRate | Reclaimed (Section 28.26) |
0x23E383 | DefaultDuration | Described in Section 5.1.4.1.13 |
0x258688 | CodecName | Described in Section 5.1.4.1.23 |
0x26B240 | CodecDownloadURL | Reclaimed (Section 28.21) |
0x2AD7B1 | TimestampScale | Described in Section 5.1.2.9 |
0x2EB524 | UncompressedFourCC | Described in Section 5.1.4.1.28.15 |
0x2FB523 | GammaValue | Reclaimed (Section 28.25) |
0x3A9697 | CodecSettings | Reclaimed (Section 28.19) |
0x3B4040 | CodecInfoURL | Reclaimed (Section 28.20) |
0x3C83AB | PrevFilename | Described in Section 5.1.2.4 |
0x3CB923 | PrevUUID | Described in Section 5.1.2.3 |
0x3E83BB | NextFilename | Described in Section 5.1.2.6 |
0x3EB923 | NextUUID | Described in Section 5.1.2.5 |
0x1043A770 | Chapters | Described in Section 5.1.7 |
0x114D9B74 | SeekHead | Described in Section 5.1.1 |
0x1254C367 | Tags | Described in Section 5.1.8 |
0x1549A966 | Info | Described in Section 5.1.2 |
0x1654AE6B | Tracks | Described in Section 5.1.4 |
0x18538067 | Segment | Described in Section 5.1 |
0x1941A469 | Attachments | Described in Section 5.1.6 |
0x1C53BB6B | Cues | Described in Section 5.1.5 |
0x1F43B675 | Cluster | Described in Section 5.1.3 |
This document creates a new IANA registry called the "Matroska Chapter Codec IDs" registry.
The values correspond to the unsigned integer ChapProcessCodecID
value described in Section 5.1.7.1.4.15.¶
To register a new Chapter Codec ID in this registry, one needs a Chapter Codec ID, a Change Controller (IETF or email of registrant) and an optional Reference to a document describing the Chapter Codec ID.¶
The Chapter Codec IDs are to be allocated according to the "First Come First Served" policy [RFC8126].¶
ChapProcessCodecID
values of "0" and "1" are RESERVED to the IETF for future use.¶
Matroska files and streams are found in three main forms: audio-video files, audio-only and occasionally with stereoscopic video tracks.¶
Historically Matroska files and streams have used the following media types with a "x-" prefix. For better compatibility a system SHOULD be able to handle both formats. Newer systems SHOULD NOT use the historic format and use the format that follows the [RFC6838] format instead.¶
Please register three media types, the [RFC6838] templates are below:¶
Additional information:¶
Additional information:¶
Additional information:¶
As Matroska evolved since 2002 many parts that were considered for use in the format were never used and often incorrectly designed. Many of the elements that were then defined are not found in any known files but were part of public specs. DivX also had a few custom elements that were designed for custom features.¶
We list these elements that have a known ID that SHOULD NOT be reused to avoid colliding with existing files. They might be reassigned by IANA in the future if there are no more IDs for a given size. A short description of what each ID was used for is included, but the text is not normative.¶
\Segment\Cluster\BlockGroup\Slices\TimeSlice\Delay
¶
\Segment\Cluster\BlockGroup\Slices\TimeSlice\SliceDuration
¶
\Segment\Cluster\BlockGroup\ReferenceFrame
¶
\Segment\Cluster\BlockGroup\ReferenceFrame\ReferenceOffset
¶
\Segment\Cluster\BlockGroup\ReferenceFrame\ReferenceTimestamp
¶
\Segment\Cluster\EncryptedBlock
¶
\Segment\Tracks\TrackEntry\TrackOffset
¶
\Segment\Tracks\TrackEntry\TrackOverlay
¶
\Segment\Tracks\TrackEntry\TrickTrackUID
¶
\Segment\Tracks\TrackEntry\TrickTrackSegmentUID
¶
\Segment\Tracks\TrackEntry\TrickTrackFlag
¶
\Segment\Tracks\TrackEntry\TrickMasterTrackUID
¶
\Segment\Tracks\TrackEntry\TrickMasterTrackSegmentUID
¶
\Segment\Attachments\AttachedFile\FileUsedStartTime
¶
\Segment\Attachments\AttachedFile\FileUsedEndTime
¶
\Segment\Tags\Tag\+SimpleTag\TagDefaultBogus
¶