Which format of a html do I should save to create mdd/mdx?

I’ve just crawled the word list from Collins French Dictionary | Translations, Definitions and Pronunciations. For each word, I have an associated link. For example, French Translation of “love” | Collins English-French Dictionary. I would like to ask

  • which format of a html I should save to create mdd/mdx? Is the below method fine?

    r = http.request(‘get’, url)
    data = str(r.data.decode(‘utf-8’))

  • How can I deal with the pronunciation associated with each word?

Thank you so much for your elaboration!