A Duden dictionary bundle might contain the following files:
- INF. The description of the bundle. It lists all included files.
- LD. The description of a particular dictionary. A bundle usually contains a single LD file. This happens when the bundle includes a single translation direction, say, German → German. Some bundles contain several languages and several (usually two) directions, e.g. German → English and also English → German.
- BOF. A single compressed file. Contains Deflate-compressed blocks. Requires an IDX to provide the block offsets.
- IDX. An array of file offsets. Used in conjunction with BOF.
- IDX+BOF Archive. A single compressed file with an index. Either text or binary. Text files are encoded using Duden encoding.
- FSD. Several files concatenated into one. Doesn’t contain the names of the files. Requires an FSI index.
- FSI. An index containing names and offsets into an archive. Used in conjunction with IDX+BOF or FSD files.
- HIC. A tree structure containing headings and article offsets into IDX+BOF archives.
Decompiling a Duden dictionary involves several steps:
- Using information in INF and LD files to get a list of archives.
- Extract the archives and decode article text from Duden encoding to Unicode.
- Convert articles from Duden markup to DSL.
- Decode ADP sound files into WAV.
- Prerender tables into bitmaps, so that they can be inserted into DSL.
The diagram shows that process in more detail.