Which format of a html do I should save to create mdd/mdx?

I’ve just crawled the word list from https://www.collinsdictionary.com/dictionary/english-french. For each word, I have an associated link. For example, https://www.collinsdictionary.com/dictionary/english-french/love. I would like to ask

  • which format of a html I should save to create mdd/mdx? Is the below method fine?

    r = http.request(‘get’, url)
    data = str(r.data.decode(‘utf-8’))

  • How can I deal with the pronunciation associated with each word?

Thank you so much for your elaboration!