抓取词典数据的问题

我尝试过大佬的方法(python2):

修改代码并改成python3环境尝试运行,技术大牛提供的代码挺OK的,不过我想要抓取三个链接
https://www.ldoceonline.com/browse/english-spanish/
https://www.ldoceonline.com/browse/spanish-english/

但是一个链接没有抓取成功,有没有大佬提供下抓取朗文英语词典的代码,谢谢 :slightly_smiling_face:

如果成功抓取,我无偿分享到此论坛里

Thank you. I don’t know much about python. I spent the whole day and night trying to give up :smiling_face_with_tear:
I want an awesome guy to send me the finished coding. But I still have a long way to go…

从头看到尾,比较简单的方法了,每个步骤都有说明。

第一步网址打不开 :sweat_smile:
我有编程的基础,不过一看教程起步有点高。不知道摸索多久能掌握呢

i think python is the easiest way to scrape, hmmm but, thank you.

已更新,有耐心的话,一小时入门,你有编程的基础只要半小时。

1 Like

为什么结果什么都没有 :sweat_smile:
address.txt里面是空的

从上到下好好阅读一遍,论坛里很多人都会了,他们都完全没基础的。

2 Likes

网址打不开,可能不支持python吧?

源代码:

# -*- coding: utf-8 -*-
import requests
import os.path
from os import path
import time
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://zidian.911cha.com/zi7684.html")
    print(page.content())
    browser.close()

for i, line in enumerate(open("address.txt")):
    filename = str(i) + ".html"  # 保存的文件名

    # 检查文件是否存在,存在跳过
    if path.exists(filename):
        continue

    headers = {
        'authority': 'zidian.911cha.com',
        'pragma': 'no-cache',
        'cache-control': 'no-cache',
        'upgrade-insecure-requests': '1',
        'dnt': '1',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-dest': 'document',
        'referer': 'https://zidian.911cha.com/zi7684.html',
        'accept-language': 'en,ja;q=0.9,zh-CN;q=0.8,zh;q=0.7',
        'cookie': 't=d910e897351658b915c67413eb1d4c2f; r=9132',
    }

    response = requests.get(line, headers=headers)

    # 打印文本行,去除前后空格换行,http状态码,响应内容长度
    print(i, line.strip(), response.status_code, len(
        response.text))

    # 发现会返回空文件,检查响应内容长度,大于1000,再保存文件
    if len(response.text) > 1000 and response.status_code == 200:
        with open(filename, "w") as f:
            f.write(response.text)

    # 等待5秒
    time.sleep(5)

运行结果如下:

前面OK的,后面报错的原因是什么?怎么解决?

只保留这段就可以了。先跑通一条,跑通之后,再改成读取多条的,类似open("address.txt")

# -*- coding: utf-8 -*-
import os.path
from os import path
import time
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://zidian.911cha.com/zi7684.html")
    with open("zi7684.html", "w") as f:
        f.write(page.content())
    browser.close()


返回900意味着被网站识别到爬虫吧?用playwright方法好像起不到作用?

只跑我上一楼发的那段代码,先跑通一条,别的都删掉,查看文件夹里 zi7684.html 的文本,如果正常就没有问题。


这个抓取数据表示没问题吧?261439意味着什么呢?


抓取成功之后,如何选择标签抓取?
这annoying广告让我不知所措哈哈

# -*- coding: utf-8 -*-
import requests
from os import path
import time
from playwright.sync_api import sync_playwright

# with sync_playwright() as p:
#     browser = p.chromium.launch()
#     page = browser.new_page()
#     page.goto("https://www.spanishdict.com/translate/como")
#     with open("translate/como", "w") as f:
#         f.write(page.content())
#     browser.close()

for i, line in enumerate(open("address.txt")):
    filename = 'output/'+str(i) + ".html"  # 保存的文件名

    # 检查文件是否存在,存在跳过
    if path.exists(filename):
        continue

    cookies = {
        'sd_inner_height': '810',
        'sd_locked_widget_views': '%7B%22views%22%3A%201%2C%20%22expires%22%3A%20%222023-12-10T00%3A51%3A49.651Z%22%7D',
        'sd_test_group': '64',
        'sd_session_group2': '44',
    }

    headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Sec-Fetch-Site': 'same-origin',
        # 'Cookie': 'sd_inner_height=810; sd_locked_widget_views=%7B%22views%22%3A%201%2C%20%22expires%22%3A%20%222023-12-10T00%3A51%3A49.651Z%22%7D; sd_test_group=64; sd_session_group2=44',
        'Sec-Fetch-Dest': 'document',
        'Accept-Language': 'zh-CN,zh-Hans;q=0.9',
        'Sec-Fetch-Mode': 'navigate',
        'Host': 'www.spanishdict.com',
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
        'Referer': 'https://www.spanishdict.com/',
        # 'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive',
    }

    response = requests.get('https://www.spanishdict.com/translate/como', cookies=cookies, headers=headers)

    # 打印文本行,去除前后空格换行,http状态码,响应内容长度
    print(i, line.strip(), response.status_code, len(
        response.text))

    # 发现会返回空文件,检查响应内容长度,大于1000,再保存文件
    if len(response.text) > 1000 and response.status_code == 200:
        with open(filename, "w") as f:
            f.write(response.text)

    # 等待5秒
    time.sleep(5)

抓取v0FDaSYd和FODx7hzT(tdVdlZk8以下不要抓取)。

我发现有很多class(FODx7hzT),但是最后一个我不需要抓取。
FODx7hzT,如下图
image

另外我抓取的一个标签内容是空白的,是不是api接口获取什么的。我不会弄:sweat_smile:

261439是你抓取到的文本内容长度。标签用beautifulsoup提取,这里有文档:

https://beautifulsoup.readthedocs.io/zh-cn/v4.4.0/

好的,为什么每一个文件文本内容长度是固定的?是不是超出限制没有读取?

空白内容,你需要用chrome分析api请求,自己拼接出请求来,如下是单词pan的例句:

https://examples1.spanishdict.com/explore?lang=en&q=pan&numExplorationSentences=100
{
    "params": {
        "q": "pan",
        "lang": "en",
        "numExplorationSentences": 100
    },
    "data": {
        "totalHits": 10000,
        "sentences": [
            {
                "id": "AF4OgocBLC2PqRpsR-va",
                "source": "Add 1 cup of water to the <em>pan</em> and stir.",
                "target": "Añadir 1 taza de agua a la <em>sartén</em> y revuelva.",
                "corpus": "paracrawl"
            },
            {
                "id": "TpXigYcBATG92IlvAZkJ",
                "source": "Reduce the heat and cover the <em>pan</em> with a lid.",
                "target": "Reducir el fuego y cubrir la <em>sartén</em> con una tapa.",
                "corpus": "paracrawl"
            },
            {
                "id": "kqLxgYcBmWHIBpHrpmuK",
                "source": "For instance, use both hands to lift a heavy <em>pan</em>.",
                "target": "Por ejemplo, usar ambas manos para levantar una <em>sartén</em> pesada.",
                "corpus": "paracrawl"
            },
            {
                "id": "G2UhgocBLC2PqRpsW3XC",
                "source": "Heat the oil in a <em>pan</em> and sauté the onion.",
                "target": "Calentar el aceite en una <em>sartén</em> y saltear la cebolla.",
                "corpus": "paracrawl"
            },
            {
                "id": "1oOVgYcBmWHIBpHrvO2l",
                "source": "We put everything in a <em>pan</em> with water and salt.",
                "target": "Lo ponemos todo en una <em>cazuela</em> con agua y sal.",
                "corpus": "paracrawl"
            },
            {
                "id": "QrYtgocBmWHIBpHrHRUc",
                "source": "Break the tofu into chunks and add to the <em>pan</em>.",
                "target": "Romper el tofu en trozos y añadir a la <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "oIvFgYcBATG92IlvTP9i",
                "source": "Put the apples in a <em>pan</em> and fry in butter.",
                "target": "Poner las manzanas en una <em>sartén</em> y freír en mantequilla.",
                "corpus": "paracrawl"
            },
            {
                "id": "An2BgYcBmWHIBpHr-Dyh",
                "source": "Take a <em>pan</em> and pour a cup of strong coffee.",
                "target": "Tomar una <em>sartén</em> y vierta una taza de café fuerte.",
                "corpus": "paracrawl"
            },
            {
                "id": "gV4NgocBLC2PqRpsbp7x",
                "source": "Add the jackfruit to the <em>pan</em> in a single layer.",
                "target": "Añadir la jaca a la <em>sartén</em> en una sola capa.",
                "corpus": "paracrawl"
            },
            {
                "id": "D60ogocBATG92Ilv1iWq",
                "source": "Peel, wash and chop the potatoes, put in a <em>pan</em>.",
                "target": "Pelar, lavar y picar las papas, poner en una <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "0YJ2gocBLC2PqRpsFgvg",
                "source": "Back to cover the <em>pan</em> and leave another 25 minutes.",
                "target": "Volver a cubrir la <em>cacerola</em> y dejarlo otros 25 minutos.",
                "corpus": "paracrawl"
            },
            {
                "id": "22McgocBLC2PqRpsjsme",
                "source": "Remove the garlic from the <em>pan</em> and cut into slices.",
                "target": "Retire el ajo de la <em>sartén</em> y cortar en rodajas.",
                "corpus": "paracrawl"
            },
            {
                "id": "CuO2gocBmWHIBpHr56qd",
                "source": "Uncover the <em>pan</em> and add the gnocchi, stirring to coat.",
                "target": "Destape la <em>sartén</em> y añadir los ñoquis, revolviendo para cubrir.",
                "corpus": "paracrawl"
            },
            {
                "id": "_y9_gYcBLC2PqRpscreq",
                "source": "Boil the rice in a large <em>pan</em> with salted water.",
                "target": "Cuece el arroz en una <em>sartén</em> grande con agua salada.",
                "corpus": "paracrawl"
            },
            {
                "id": "JIi6gYcBATG92IlvTlN6",
                "source": "Add more oil to the <em>pan</em> if it is dry.",
                "target": "Agregue más aceite a la <em>sartén</em> si está seco.",
                "corpus": "paracrawl"
            },
            {
                "id": "mYGkgYcBATG92Ilv8CFn",
                "source": "Cover the <em>pan</em> and cook the ingredients for 15 minutes.",
                "target": "Cubre la <em>sartén</em> y cocina los ingredientes durante 15 minutos.",
                "corpus": "paracrawl"
            },
            {
                "id": "GbEegocBmWHIBpHrYCvd",
                "source": "Remove from the <em>pan</em> and serve however you choose.",
                "target": "Retirar de la <em>sartén</em> y servir sin embargo usted elige.",
                "corpus": "paracrawl"
            },
            {
                "id": "R5HUgYcBATG92IlvfyOL",
                "source": "Fry in a <em>pan</em> with oil, cook on both sides.",
                "target": "Fríalo en una <em>sartén</em> con aceite, dorándolos por ambos lados.",
                "corpus": "paracrawl"
            },
            {
                "id": "y6kdgocBATG92IlvKFo4",
                "source": "In a <em>pan</em> on medium high temperature, add the oil.",
                "target": "En un <em>sartén</em> en temperatura mediana alta, agrega el aceite.",
                "corpus": "paracrawl"
            },
            {
                "id": "UmkrgocBLC2PqRpsvQP1",
                "source": "Cover a <em>pan</em> with a lid to smother the flames.",
                "target": "Cubra el <em>sartén</em> con una tapa para sofocar las llamas.",
                "corpus": "paracrawl"
            },
            {
                "id": "mouTgocBLC2PqRpsLrEz",
                "source": "Heat about 2 inches of oil in a <em>pan</em> or skillet.",
                "target": "Calor sobre 2 pulgadas de aceite en una <em>cacerola</em> o sartén.",
                "corpus": "paracrawl"
            },
            {
                "id": "e68agocBmWHIBpHrkO1m",
                "source": "Preheat a nonstick <em>pan</em> to medium heat for 3 minutes.",
                "target": "Precalienta una <em>sartén</em> antiadherente a temperatura media por 3 minutos.",
                "corpus": "paracrawl"
            },
            {
                "id": "SY_PgYcBATG92IlvJEYE",
                "source": "Sauté the vegetables in a <em>pan</em> with olive oil.",
                "target": "Saltear las verduras en una <em>sartén</em> con aceite de oliva.",
                "corpus": "paracrawl"
            },
            {
                "id": "vbMkgocBmWHIBpHrHQ4Z",
                "source": "Put the oil in a <em>pan</em> and sauté the onion.",
                "target": "Coloque el aceite en una <em>sarten</em> y rehogue la cebolla.",
                "corpus": "paracrawl"
            },
            {
                "id": "Xvr7gocBmWHIBpHr2Oac",
                "source": "Add 2 tablespoons of Ghee (clarified butter) in the <em>pan</em>.",
                "target": "Agrega 2 cucharadas de ghee (mantequilla clarificada) en la <em>cacerola</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "pC56gYcBLC2PqRpsaAvG",
                "source": "Served in the <em>pan</em> on a bed of vegetables.",
                "target": "Servido en la <em>sartén</em> sobre una cama de verduras.",
                "corpus": "paracrawl"
            },
            {
                "id": "GloAgocBLC2PqRpsYy1Y",
                "source": "Melt the butter in a non-stick <em>pan</em> on medium heat.",
                "target": "Derrite la mantequilla en una <em>sartén</em> antiadherente a fuego medio.",
                "corpus": "paracrawl"
            },
            {
                "id": "T853gocBmWHIBpHrZoYv",
                "source": "Melt the butter in a <em>pan</em> and brown the garlic.",
                "target": "Derrite la mantequilla en una <em>sartén</em> y dora los ajos.",
                "corpus": "paracrawl"
            },
            {
                "id": "1S58gYcBLC2PqRpsockm",
                "source": "Then, put the oil in a large <em>pan</em> and heat it.",
                "target": "Luego, poner el aceite en una <em>cacerola</em> grande y calentarlo.",
                "corpus": "paracrawl"
            },
            {
                "id": "p8NVgocBmWHIBpHrV1GN",
                "source": "Heat the oil in a non-stick <em>pan</em> over medium-high heat.",
                "target": "Calentar el aceite en una <em>sartén</em> antiadherente a fuego medio-alto.",
                "corpus": "paracrawl"
            },
            {
                "id": "AQQZg4cBmWHIBpHrrtm1",
                "source": "Add to compare 28) 744 Yellow ochre watercolor <em>pan</em> Cotman.",
                "target": "Añadir para comparar 28) 744 Ocre amarillo acuarela <em>pastilla</em> Cotman.",
                "corpus": "paracrawl"
            },
            {
                "id": "Y7pSgocBATG92Ilvc9__",
                "source": "Pour the <em>pan</em> oil, and thenput a tablespoon of dough.",
                "target": "Verter el aceite de <em>sartén</em>, y luegoponer una cucharada de masa.",
                "corpus": "paracrawl"
            },
            {
                "id": "JJv0gYcBATG92Ilv8PUE",
                "source": "Stir the solution and quickly pour it into the <em>pan</em>.",
                "target": "Agitar la solución y rápidamente se vierte en el <em>molde</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "taYTgocBATG92IlvYjOr",
                "source": "Heat the oil in a large <em>pan</em> over medium-high heat.",
                "target": "Calentar el aceite en una <em>sartén</em> grande a fuego medio-alto.",
                "corpus": "paracrawl"
            },
            {
                "id": "0ZblgYcBATG92Ilvq9t6",
                "source": "Add a few spoons of marinara sauce to the <em>pan</em>.",
                "target": "Añadir unas cucharadas de salsa marinara a la <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "QnJ7gYcBATG92IlvLfz2",
                "source": "Remove the <em>pan</em> from the heat, add salt and pepper.",
                "target": "Retire la <em>sartén</em> del fuego, añadir sal y pimienta.",
                "corpus": "paracrawl"
            },
            {
                "id": "R7gZg4cBLC2PqRpsACX4",
                "source": "Add to compare 21) 329 Intense green watercolor <em>pan</em> Cotman.",
                "target": "Añadir para comparar 21) 329 Verde intenso acuarela <em>pastilla</em> Cotman.",
                "corpus": "paracrawl"
            },
            {
                "id": "OJbjgYcBATG92Ilvcx3p",
                "source": "After Put them in a deep <em>pan</em> and cover with water.",
                "target": "Después Ponerlos en una <em>sartén</em> honda y cubrir con agua.",
                "corpus": "paracrawl"
            },
            {
                "id": "25ragYcBmWHIBpHrI3Uy",
                "source": "FRY in a <em>pan</em> with a little oil and serve.",
                "target": "Freímos en una <em>sartén</em> con un poco de aceite y servimos.",
                "corpus": "paracrawl"
            },
            {
                "id": "8lHlgYcBLC2PqRpshnFo",
                "source": "Meanwhile, peel the potatoes and fry them in another <em>pan</em>.",
                "target": "Mientras, pelamos las patatas y las freímos en otra <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "-EK4gYcBLC2PqRpsx5xE",
                "source": "Add more oil if the <em>pan</em> is too dry.",
                "target": "Agregue más aceite si el <em>pan</em> está demasiado seco.",
                "corpus": "paracrawl"
            },
            {
                "id": "D35qgocBLC2PqRpsylB3",
                "source": "Melt the apple jelly in a <em>pan</em> and pour over.",
                "target": "Derretir la gelatina de manzana en una <em>cacerola</em> y verter sobre.",
                "corpus": "paracrawl"
            },
            {
                "id": "8bAdgocBmWHIBpHroPOW",
                "source": "Add peppers and other cut vegetables to the <em>pan</em>.",
                "target": "Añade los pimientos y otras verduras cortadas a la <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "SP0Zg4cBATG92IlvORsF",
                "source": "Add to compare 19) 327 Intense blue watercolor <em>pan</em> Cotman.",
                "target": "Añadir para comparar 19) 327 Azul intenso acuarela <em>pastilla</em> Cotman.",
                "corpus": "paracrawl"
            },
            {
                "id": "1abkgocBLC2PqRpsopHi",
                "source": "Pour into Bundt <em>pan</em> and bake for 45 minutes.",
                "target": "Verter en <em>el molde</em> Bundt y hornear durante 45 minutos.",
                "corpus": "paracrawl"
            },
            {
                "id": "_9qbgocBmWHIBpHrHWoC",
                "source": "Remove <em>pan</em> from oven and allow to cool a bit.",
                "target": "Retire <em>la sartén</em> del horno y dejar enfriar un poco.",
                "corpus": "paracrawl"
            },
            {
                "id": "l7UrgocBmWHIBpHrb3m7",
                "source": "When cooked, pour a drop of cognac in the <em>pan</em>.",
                "target": "Cuando estén cocinadas, añadir una gota de coñac en la <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "JWszgocBLC2PqRpsAHgE",
                "source": "Remove from the <em>pan</em> using a slotted spoon and drain well.",
                "target": "Quitar de la <em>sartén</em> con una espumadera y escurrir bien.",
                "corpus": "paracrawl"
            },
            {
                "id": "G2UhgocBLC2PqRpsmYt1",
                "source": "Grind or pulverize the ingredients and put them in a <em>pan</em>.",
                "target": "Muela o pulverice los ingredientes y póngalos en una <em>cacerola</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "tNGUgocBATG92Ilv7zRv",
                "source": "To put slices in a bowl or the enameled <em>pan</em>.",
                "target": "Poner los trozos en la escudilla o la <em>cacerola</em> esmaltada.",
                "corpus": "paracrawl"
            },
            {
                "id": "-3hYgocBLC2PqRps7mPL",
                "source": "Cherry tomatoes into quarters and also give in the frying <em>pan</em>.",
                "target": "Tomates cherry en cuartos y también dar en la <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "4iyOgIcBmWHIBpHrEyu3",
                "source": "A leak in the kitchen with a <em>pan</em> under it.",
                "target": "Una fuga en la cocina con una <em>sartén</em> debajo de ella.",
                "corpus": "open-subtitles"
            },
            {
                "id": "QrYtgocBmWHIBpHrMBun",
                "source": "Cut the courgettes and heat the vegetable oil in a <em>pan</em>.",
                "target": "Cortar los calabacines y calentar el aceite en una <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "PLUqgocBmWHIBpHrJgsk",
                "source": "Place artichokes in a large <em>pan</em> and cover with water.",
                "target": "Colocar las alcachofas en una <em>olla</em> grande y cubrirlas con agua.",
                "corpus": "paracrawl"
            },
            {
                "id": "D1f3gYcBLC2PqRpsEyFy",
                "source": "Then add the onions and carrots to the <em>pan</em>.",
                "target": "Luego agrega las cebollas y las zanahorias a la <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "Fs2KgocBATG92IlvotGu",
                "source": "This cast iron <em>pan</em> is simple but unbelievably practical.",
                "target": "Esta <em>sartén</em> de hierro fundido es simple pero increíblemente práctico.",
                "corpus": "paracrawl"
            },
            {
                "id": "5a77gocBLC2PqRps6Wrh",
                "source": "Add 1 teaspoon margarine and swirl the <em>pan</em> to distribute.",
                "target": "Agregue 1 cucharadita de margarina y revuelva la <em>sartén</em> para distribuirla.",
                "corpus": "paracrawl"
            },
            {
                "id": "zIehgYcBmWHIBpHrQ6bO",
                "source": "Cover the <em>pan</em> and steep the tea for 10 minutes.",
                "target": "Cubre la <em>olla</em> y carga el té por 10 minutos.",
                "corpus": "paracrawl"
            },
            {
                "id": "I2cmgocBLC2PqRps7V2M",
                "source": "If the <em>pan</em> gets dry, add water in small increments.",
                "target": "Si el <em>pan</em> se seca, agregar el agua en pequeños incrementos.",
                "corpus": "paracrawl"
            },
            {
                "id": "RpzegYcBmWHIBpHruQtO",
                "source": "A cast iron <em>pan</em> is the next best option.",
                "target": "Una <em>sartén</em> de hierro fundido es la siguiente mejor opción.",
                "corpus": "paracrawl"
            },
            {
                "id": "p60qgocBATG92Ilvl7hr",
                "source": "Vegetables with garlic and olive oil, fried in a <em>pan</em>.",
                "target": "Vegetales con ajos y aceite de oliva fritos en una <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "IjmdgYcBLC2PqRps_acH",
                "source": "Fry them in oil with the curry in a large <em>pan</em>.",
                "target": "Fríelos en aceite con el curry en una <em>sartén</em> grande.",
                "corpus": "paracrawl"
            },
            {
                "id": "TJfngYcBATG92IlvHVLK",
                "source": "They can be poked from frozen right in the <em>pan</em>.",
                "target": "Pueden ser asomaban de derecha congelada en la <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "vp3kgYcBmWHIBpHrhfBk",
                "source": "Remove from <em>pan</em> and place in a large bowl.",
                "target": "Retire de <em>la sartén</em> y coloque en un tazón grande.",
                "corpus": "paracrawl"
            },
            {
                "id": "OFDhgYcBLC2PqRps_0G5",
                "source": "Place in a <em>pan</em> with the oil and sauteé.",
                "target": "Poner en una <em>cacerola</em> con el aceite y rehogar.",
                "corpus": "paracrawl"
            },
            {
                "id": "tWEXgocBLC2PqRpsIPxG",
                "source": "Press dough into bottom and sides of a large <em>pan</em>.",
                "target": "Presione la pasta en fondo y lados de una <em>cacerola</em> grande.",
                "corpus": "paracrawl"
            },
            {
                "id": "lLcWg4cBLC2PqRpsyGUz",
                "source": "Add to compare 05) 238 Gomaguta watercolor <em>pan</em> Van Gogh.",
                "target": "Añadir para comparar 05) 238 Gomaguta acuarela <em>pastilla</em> Van Gogh.",
                "corpus": "paracrawl"
            },
            {
                "id": "krcWg4cBLC2PqRpsyGUz",
                "source": "Add to compare 37) 416 Sepia watercolor <em>pan</em> Van Gogh.",
                "target": "Añadir para comparar 37) 416 Sepia acuarela <em>pastilla</em> Van Gogh.",
                "corpus": "paracrawl"
            },
            {
                "id": "k7cWg4cBLC2PqRpsyGUz",
                "source": "Add to compare 09) 311 Vermilion watercolor <em>pan</em> Van Gogh.",
                "target": "Añadir para comparar 09) 311 Bermellon acuarela <em>pastilla</em> Van Gogh.",
                "corpus": "paracrawl"
            },
            {
                "id": "KdSIgocBmWHIBpHrRC75",
                "source": "Also add in the frying <em>pan</em> 1/4 tablespoon of salt.",
                "target": "También agregue en la <em>sartén</em> 1/4 de cucharada de sal.",
                "corpus": "paracrawl"
            },
            {
                "id": "bdB9gocBmWHIBpHrCWQ7",
                "source": "Add in the frying <em>pan</em> 1 cup of refried beans.",
                "target": "Agregue en la <em>sartén</em> 1 taza de frijoles refritos.",
                "corpus": "paracrawl"
            },
            {
                "id": "K_0Zg4cBATG92IlvhjOl",
                "source": "Add to compare 46) 585 Indanthrene blue watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 46) 585 Azul indantreno acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "C3qRgYcBATG92IlveHTt",
                "source": "The wok <em>pan</em> is used in more and more kitchens.",
                "target": "La <em>sartén</em> wok se usa en más y más cocinas.",
                "corpus": "paracrawl"
            },
            {
                "id": "YlscgYcBmWHIBpHrcHoI",
                "source": "If your dreams don't <em>pan</em> out, it's a viable option.",
                "target": "Si tus sueños no se <em>cumplen</em>, es una opción viable.",
                "corpus": "open-subtitles"
            },
            {
                "id": "82gpgocBLC2PqRps1Wjl",
                "source": "Put everything in a frying <em>pan</em> (a wookpan if available).",
                "target": "Puesto todo en un <em>sartén</em> (un wookpan si está disponible).",
                "corpus": "paracrawl"
            },
            {
                "id": "mgQZg4cBmWHIBpHridWG",
                "source": "Add to compare 16) 211 Cadmium orange watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 16) 211 Anaranjado cadmio acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "S0zVgYcBLC2PqRps8jkF",
                "source": "The prawns in a frying <em>pan</em> on each side about.",
                "target": "Las gambas en una <em>sartén</em> a cada lado sobre.",
                "corpus": "paracrawl"
            },
            {
                "id": "obgZg4cBLC2PqRps1XVS",
                "source": "Add to compare 33) 539 Cobalt violet watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 33) 539 Violeta cobalto acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "5wQZg4cBmWHIBpHrEq-J",
                "source": "Add to compare 40) 511 Cobalt blue watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 40) 511 Azul cobalto acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "j7gZg4cBLC2PqRpsFC0t",
                "source": "Add to compare 51) 662 Permanent green watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 51) 662 Verde permanente acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "wLgZg4cBLC2PqRpsTUAj",
                "source": "Add to compare 48) 522 Turquoise blue watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 48) 522 Azul turquesa acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "b7gZg4cBLC2PqRpsc0-a",
                "source": "Add to compare 49) 640 Bluish green watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 49) 640 Verde azulado acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "vf0Zg4cBATG92IlvdC-u",
                "source": "Add to compare 41) 534 Cerulean blue watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 41) 534 Azul ceruleo acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "TfwWg4cBATG92IlvoUXS",
                "source": "Add to compare 27) 616 Viridian watercolor <em>pan</em> Van Gogh.",
                "target": "Añadir para comparar 27) 616 Verde esmeralda acuarela <em>pastilla</em> Van Gogh.",
                "corpus": "paracrawl"
            },
            {
                "id": "L4vFgYcBATG92IlvE-hb",
                "source": "Bring oil to a high temperature in a frying <em>pan</em>.",
                "target": "Llevar el aceite a una temperatura alta en una <em>sartén</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "2n-fgYcBATG92Ilv0WGj",
                "source": "Heat the oil in a frying <em>pan</em> and brown the meat.",
                "target": "Calentar el aceite en una <em>sartén</em> y dorar la carne.",
                "corpus": "paracrawl"
            },
            {
                "id": "38ZegocBmWHIBpHru0Qj",
                "source": "Meanwhile, heat a frying <em>pan</em> and spray with canola oil.",
                "target": "Mientras tanto, calentar una <em>sartén</em> y rociar con aceite de canola.",
                "corpus": "paracrawl"
            },
            {
                "id": "yAQZg4cBmWHIBpHrTLkF",
                "source": "Add to compare 17) 266 Permanent orange watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 17) 266 Anaranjado permanente acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "Lv0Zg4cBATG92Ilvw0ww",
                "source": "Add to compare 36) 507 Ultramarine violet watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 36) 507 Violeta ultramar acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "qAQZg4cBmWHIBpHrrti1",
                "source": "Add to compare 47) 533 Indigo watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 47) 533 Indigo acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "Cf0Zg4cBATG92IlvTiIi",
                "source": "Add to compare 11) 242 Aureoline watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 11) 242 Aureolina acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "67QogocBmWHIBpHrYXIS",
                "source": "Put a wok or a <em>pan</em> over very high heat.",
                "target": "Coloca un wok o una <em>sartén</em> a fuego muy alto.",
                "corpus": "paracrawl"
            },
            {
                "id": "Y6MKgocBATG92IlvnEel",
                "source": "Place the bacon in a <em>pan</em> over medium low heat.",
                "target": "Colocar el tocino en una <em>sartén</em> a fuego medio bajo.",
                "corpus": "paracrawl"
            },
            {
                "id": "27gZg4cBLC2PqRpsOzkH",
                "source": "Add to compare 18) 311 Vermilion watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 18) 311 Bermellon acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "8jGGgYcBLC2PqRpsPfWA",
                "source": "A young woman chooses a non-stick frying <em>pan</em> in the store.",
                "target": "Una mujer joven elige un <em>sartén</em> antiadherente en la tienda.",
                "corpus": "paracrawl"
            },
            {
                "id": "j7Q9gocBATG92Ilv1SSm",
                "source": "Add in the frying <em>pan</em> 1/2 teaspoon of sugar.",
                "target": "Agregue en la <em>sartén</em> 1/2 cucharadita de azúcar.",
                "corpus": "paracrawl"
            },
            {
                "id": "QgQZg4cBmWHIBpHrr984",
                "source": "Add to compare 34) 532 Mauve watercolor <em>pan</em> Rembrandt.",
                "target": "Añadir para comparar 34) 532 Malva acuarela <em>pastilla</em> Rembrandt.",
                "corpus": "paracrawl"
            },
            {
                "id": "U8tugocBmWHIBpHrAmpV",
                "source": "In a frying <em>pan</em> ignited half cup of buckwheat, cool.",
                "target": "En una <em>sartén</em> se enciende media taza de trigo sarraceno, fresco.",
                "corpus": "paracrawl"
            },
            {
                "id": "Tn2agYcBATG92IlvBWIY",
                "source": "Dry the bottom of the mold <em>pan</em> with a towel.",
                "target": "Seca la parte inferior del <em>molde</em> con una toalla.",
                "corpus": "paracrawl"
            },
            {
                "id": "BasMgocBmWHIBpHrl1tn",
                "source": "Heat the vegetable stock or water in a sauce <em>pan</em>.",
                "target": "Calentar el caldo de verduras o agua en una <em>cacerola</em>.",
                "corpus": "paracrawl"
            }
        ],
        "translations": [
            {
                "translation": "sartén",
                "count": 1741,
                "isFacet": true
            },
            {
                "translation": "la sartén",
                "count": 1656,
                "isFacet": true
            },
            {
                "translation": "cacerola",
                "count": 954,
                "isFacet": true
            },
            {
                "translation": "molde",
                "count": 503,
                "isFacet": true
            },
            {
                "translation": "olla",
                "count": 310,
                "isFacet": true
            },
            {
                "translation": "pastilla",
                "count": 76,
                "isFacet": false
            }
        ]
    }
}
1 Like

内容长度固定,一般就是爬取出错了,有可能是服务器识别出你是爬虫,返回你相同的内容。你要自己检查下文本,是不是你需要的。

response = requests.get('https://www.spanishdict.com/translate/como', cookies=cookies, headers=headers)

你看你代码里,url是固定的。你要替换成文本里的url路径。