某dat格式浅析

superfan89十四年前逆向过了,来自该软件的mdx词典部分归功于此。
可惜论坛没了,我没下载到ydDumper工具。没办法,自己来吧。

21世纪大英汉词典&新汉英大词典【11/8/1更新】 - MDict 词库资源区 - MDict Dictionaries - 掌上百科 - PDAWIKI - Powered by Discuz!

struct DictionaryHeader {
    uint64_t unknown_u64_1;
    uint64_t unknown_u64_2;
    uint8_t  name_len;
    char     name[name_len];
    uint64_t unknown_u64_3;
    uint64_t total_entries;
    uint32_t magic;       // Magic bytes (raw 00 C8 00 00)
    uint32_t info_length;
    char     info(info_length);
};

struct Level1IndexTable {
    uint32_t index_count;
    Level1IndexItem items[index_count]; 
};

struct Level1IndexItem {
    char      first_key[];
    uint8_t   delimiter;       // Separator (0x09 / Tab)
    uint32_t  idx_offset;      // Offset to the start of Level 2 Index block
    uint32_t  idx_items_count; // Number of items in that L2 block
    uint32_t  idx_len;         // Length (bytes) of the L2 block
    uint32_t  content_offset;  // Offset to the start of Content Data block
    uint32_t  content_len;     // Length (bytes) of the Content Data block
};

struct Level2IndexTable {
    Level2IndexItem items[idx_items_count];
};

struct Level2IndexItem {
    char      key_text[];
    uint8_t   terminator;               // Separator (0x09 / Tab)
    uint32_t  offset_in_ContentItem_be; // Relative offset (Big-Endian)
};

struct ContentDataStream {
    byte raw_pool[];
};

struct ContentItem {
    VarInt  data_length_be;  // (Big-Endian)
    char    text[data_length_be & 0x3F];
};

Level2IndexItem, ContentItem:ZLIB_INFLATE( byte[i] XOR ((i * 7) % 0x8897) )
2 Likes

ydDumper.zip (3.7 KB)
应该是这个吧。

2 Likes

挺好,这是旧版ydic格式的,互补了。

amob兄挺活跃的啊:grin:

spf前期做了不少词典,近来好像不见踪影了?

期待做出一个相对完善的版本。话说这本词典真的很好,为什么没有好的电子数据呢

只有有道和pleco有。有道做得烂,pleco提取难。数据质量一般的mdx一抓一大把,少见多怪。