A Duden dictionary bundle might contain the following files
- INF. The description of the bundle. It enumerates all included files.
- LD. The description of a particular dictionary. A bundle usually contains a single LD file. This happens when the bundle includes a single translation direction, say, German → German. Some bundles contain several languages and several (usually two) directions, e.g. German → English and also English → German.
- BOF. A single compressed file. Contains Deflate-compressed blocks. Requires an IDX to provide the block offsets.
- IDX. An array of file offsets. Used in conjuction with BOF.
- IDX+BOF Archive. A single compressed file with an index. Either text or binary. Text files are encoded using Duden encoding.
- FSD. Several files concatenated into one. Doesn’t contain the names of the files. Requires an FSI index.
- FSI. An index containing names and offsets into an archive. Used in conjuction with IDX+BOF or FSD files.
- HIC. A tree structure containing headings and article offsets.
Decompiling a Duden dictionary involves several steps:
- Using information in INF and LD sort out all files into archives.
- Extract the archives and decode article text into Unicode.
- Convert articles from Duden markup format into DSL.
- Decode ADP sound files into WAV.
- Prerender tables into bitmaps, so that they can be inserted into DSL.
The diagram shows that process in more detail.