A Duden dictionary bundle might contain the following files:

  • INF. The description of the bundle. It lists all included files.
  • LD. The description of a particular dictionary. A bundle usually contains a single LD file. This happens when the bundle includes a single translation direction, say, German → German. Some bundles contain several languages and several (usually two) directions, e.g. German → English and also English → German.
  • BOF. A single compressed file. Contains Deflate-compressed blocks. Requires an IDX to provide the block offsets.
  • IDX. An array of file offsets. Used in conjunction with BOF.
  • IDX+BOF Archive. A single compressed file with an index. Either text or binary. Text files are encoded using Duden encoding.
  • FSD. Several files concatenated into one. Doesn’t contain the names of the files. Requires an FSI index.
  • FSI. An index containing names and offsets into an archive. Used in conjunction with IDX+BOF or FSD files.
  • HIC. A tree structure containing headings and article offsets into IDX+BOF archives.

Decompiling a Duden dictionary involves several steps:

  • Using information in INF and LD files to get a list of archives.
  • Extract the archives and decode article text from Duden encoding to Unicode.
  • Convert articles from Duden markup to DSL.
  • Decode ADP sound files into WAV.
  • Prerender tables into bitmaps, so that they can be inserted into DSL.

The diagram shows that process in more detail.

Overview