抓取词典数据的问题

如果删除不要的元素,代码量很大,不一定保证删除正确。有什么比较好的解决方法?

只能多加小心,没有好办法。把原文件备份下,防止删错。


这个see more怎么处理?

点击这个的话,好像会触发这个?

你首先需要学习requests和bs4两个库,其次需要了解如何使用代理和多线程。下面是爬取朗文英西字典的简单示例代码。真要爬取,你还要做很多工作的。

import requests
from bs4 import BeautifulSoup as bs

with open(‘cookie.txt’, encoding=‘utf-8-sig’) as f: cookie=f.read()
headers={‘cookie’:cookie,‘User-Agent’:‘Edge/120.0.0.0’}
r=requests.get(‘A (REAL) TROOPER - Spanish translation - Longman’, headers=headers)
soup=bs(r.text, ‘lxml’)
data=soup.select_one(‘div[class=dictionary]’)
print(data.get_text())

see more 的数据保存在名为 SD_COMPONENT_DATA 的脚本字符串中,可以提取出这个字符串转成JSON,然后自己拼接出对应的HTML片段,拼接使用前面#28楼提到的jinja2。

这个html源代码怎么获取的json?我发现抓取之后example都不见了,而且存储到本地没有成功

这个html源代码怎么获取的json?我发现抓取之后examples都不见了,而且存储到本地没有成功。

我刚看了一下,这里面json没有找到see more相关的。html源代码里没有找到see more里的内容,而且本地能点击展开。。。这比我想象还要复杂啊

see more在html源代码里有,你已经下载成功了。example的数据,需要二次请求:

https://examples1.spanishdict.com/explore?lang=es&q=agua&numExplorationSentences=100

返回:

{
    "params": {
        "q": "agua",
        "lang": "es",
        "numExplorationSentences": 100
    },
    "data": {
        "totalHits": 10000,
        "sentences": [
            {
                "id": "IbUrgocBmWHIBpHrg4Ki",
                "source": "Puro o diluido en un vaso de <em>agua</em> (200 ml).",
                "target": "Pure or diluted in a glass of <em>water</em> (200 ml).",
                "corpus": "paracrawl"
            },
            {
                "id": "AChqgYcBLC2PqRpsZL8H",
                "source": "En este lugar, la ausencia de <em>agua</em> es casi absoluta.",
                "target": "In this place, the absence of <em>water</em> is almost absolute.",
                "corpus": "paracrawl"
            },
            {
                "id": "xpTIgYcBmWHIBpHrbKd3",
                "source": "Aquí en San Antonio el <em>agua</em> es suave y limpia.",
                "target": "Here in San Antonio the <em>water</em> is soft and clean.",
                "corpus": "paracrawl"
            },
            {
                "id": "t3p6gYcBmWHIBpHraLEz",
                "source": "El <em>agua</em> es increíblemente transparente y turquesa en esta zona.",
                "target": "The <em>water</em> is incredibly transparent and turquoise in this area.",
                "corpus": "paracrawl"
            },
            {
                "id": "FC2QgIcBmWHIBpHrvxBI",
                "source": "Y un tiburón sabe cuando hay sangre en el <em>agua</em>.",
                "target": "And a shark knows when there's blood in the <em>water</em>.",
                "corpus": "open-subtitles"
            },
            {
                "id": "qZChgocBLC2PqRps6oqA",
                "source": "Ambos isómeros son insolubles en <em>agua</em>, pero miscibles con éter.",
                "target": "Both isomers are insoluble in <em>water</em>, but miscible with ether.",
                "corpus": "paracrawl"
            },
            {
                "id": "epSugocBLC2PqRpsl9Ds",
                "source": "Deshidratación (pérdida de <em>agua</em> y otros líquidos en el cuerpo)",
                "target": "Dehydration (loss of <em>water</em> and other fluids in the body)",
                "corpus": "paracrawl"
            },
            {
                "id": "7bUSg4cBLC2PqRpsM96q",
                "source": "Aplicación foliar: 1,5 ml/L de <em>agua</em> en tratamientos con vegetación.",
                "target": "Foliar application: 1,5 ml/L of <em>water</em> in treatments with vegetation.",
                "corpus": "paracrawl"
            },
            {
                "id": "0CRdgYcBLC2PqRps7520",
                "source": "Mantenga una botella de <em>agua</em> con usted durante el día.",
                "target": "Keep a bottle of <em>water</em> with you during the day.",
                "corpus": "paracrawl"
            },
            {
                "id": "5X-KgYcBmWHIBpHrNvf6",
                "source": "Lave el dedo con jabón y <em>agua</em> durante 5 minutos.",
                "target": "Wash the finger with soap and <em>water</em> for 5 minutes.",
                "corpus": "paracrawl"
            },
            {
                "id": "O4mmgYcBmWHIBpHrs3gw",
                "source": "Lave la herida con jabón y <em>agua</em> durante 5 minutos.",
                "target": "Wash the wound with soap and <em>water</em> for 5 minutes.",
                "corpus": "paracrawl"
            },
            {
                "id": "vPwAg4cBmWHIBpHr4JST",
                "source": "Calidad y acceso a <em>agua</em> potable en Tenerife alta (62)",
                "target": "Quality and access to drinking <em>water</em> in Tenerife high (62)",
                "corpus": "paracrawl"
            },
            {
                "id": "DqTfgocBLC2PqRps5vc1",
                "source": "Dosificación: usar 5 ml por cada 80 litros de <em>agua</em>.",
                "target": "Dosage: using 5 ml for each 80 litres of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "VLIIg4cBLC2PqRpsW4kJ",
                "source": "Calidad y acceso a <em>agua</em> potable en Varadero moderada (56)",
                "target": "Quality and access to drinking <em>water</em> in Varadero moderate (56)",
                "corpus": "paracrawl"
            },
            {
                "id": "Gk3ZgYcBLC2PqRps_qi0",
                "source": "Todos los asentamientos están equipados con electricidad, <em>agua</em> y gas.",
                "target": "All the settlements are equipped with electricity, <em>water</em> and gas.",
                "corpus": "paracrawl"
            },
            {
                "id": "4EHNgIcBmWHIBpHrWj5V",
                "source": "El mejor lugar para conservar <em>agua</em> es en tu cuerpo.",
                "target": "The best place to conserve <em>water</em> is in your body.",
                "corpus": "open-subtitles"
            },
            {
                "id": "mRxFgYcBLC2PqRpslnXK",
                "source": "Bueno, ha estado en el <em>agua</em> un par de días.",
                "target": "Well, it's been in the <em>water</em> a couple of days.",
                "corpus": "open-subtitles"
            },
            {
                "id": "2-CugocBmWHIBpHrlOFl",
                "source": "Mezclar con 4-8 onzas de <em>agua</em> o su bebida favorita.",
                "target": "Mix with 4-8 ounces of <em>water</em> or your favorite beverage.",
                "corpus": "paracrawl"
            },
            {
                "id": "b4yvgYcBmWHIBpHroFxz",
                "source": "Añadir a mi guía Aquavox Una experiencia en el <em>agua</em>.",
                "target": "Add to my guide Aquavox An experience in the <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "AF4OgocBLC2PqRpsR-va",
                "source": "Añadir 1 taza de <em>agua</em> a la sartén y revuelva.",
                "target": "Add 1 cup of <em>water</em> to the pan and stir.",
                "corpus": "paracrawl"
            },
            {
                "id": "XHaGgYcBATG92Ilv298c",
                "source": "Un apartamento en el <em>agua</em> ofrece todo esto y más.",
                "target": "An apartment on the <em>water</em> offers all this and more.",
                "corpus": "paracrawl"
            },
            {
                "id": "0WldgYcBATG92IlvjQd2",
                "source": "Tome esta medicina por boca con un vaso de <em>agua</em>.",
                "target": "Take this medicine by mouth with a glass of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "SHBcgYcBmWHIBpHro8q9",
                "source": "Tenga una botella de <em>agua</em> con usted durante el día.",
                "target": "Keep a bottle of <em>water</em> with you during the day.",
                "corpus": "paracrawl"
            },
            {
                "id": "1C16gYcBLC2PqRpsGu2W",
                "source": "Es la cantidad de carbonatos y bicarbonatos en el <em>agua</em>.",
                "target": "Is the amount of carbonates and bicarbonates in the <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "1nBbgYcBmWHIBpHrzoFz",
                "source": "Tome este medicamento por boca con un vaso de <em>agua</em>.",
                "target": "Take this medicine by mouth with a glass of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "okjKgYcBLC2PqRpsLls9",
                "source": "Luz y <em>agua</em> conectados a la red general, Aire acondicionado.",
                "target": "Light and <em>water</em> connected to the general network, Air conditioning.",
                "corpus": "paracrawl"
            },
            {
                "id": "j4WbgYcBmWHIBpHrd9XI",
                "source": "Lave el dedo con jabón y <em>agua</em> durante 5 minutos.",
                "target": "Wash the toe with soap and <em>water</em> for 5 minutes.",
                "corpus": "paracrawl"
            },
            {
                "id": "YZXLgYcBmWHIBpHrVpM9",
                "source": "En 2 litros de <em>agua</em> a hervir una cebolla entera.",
                "target": "In 2 liters of <em>water</em> to boil a whole onion.",
                "corpus": "paracrawl"
            },
            {
                "id": "JXR_gYcBATG92IlvOVXB",
                "source": "Tan sencillo, perfecto y delicado como una gota de <em>agua</em>.",
                "target": "So simple, perfect and delicate like a drop of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "96z3gocBLC2PqRpsQNY5",
                "source": "Escuchar Mahjong - Castillo de <em>agua</em> juegos relacionados y actualizaciones.",
                "target": "Play Mahjong - Castle on <em>water</em> related games and updates.",
                "corpus": "paracrawl"
            },
            {
                "id": "5pKogocBLC2PqRpseb4M",
                "source": "Sumerja el parche en <em>agua</em> de 2 a 6 segundos.",
                "target": "Dip the patch in <em>water</em> for 2 to 6 seconds.",
                "corpus": "paracrawl"
            },
            {
                "id": "xipvgYcBLC2PqRpsuYhW",
                "source": "En verano, lleva protección solar y una botella de <em>agua</em>.",
                "target": "In summer, bring solar protection and a bottle of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "vawQgocBmWHIBpHrG3aT",
                "source": "Tome Noroxin con un vaso lleno de <em>agua</em> (8 onzas).",
                "target": "Take Noroxin with a full glass of <em>water</em> (8 ounces).",
                "corpus": "paracrawl"
            },
            {
                "id": "gIyVgocBLC2PqRpss4Ok",
                "source": "Inmediatamente antes del procedimiento, enjuague su boca con <em>agua</em> hervida.",
                "target": "Immediately before the procedure, rinse your mouth with boiled <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "aXRogYcBmWHIBpHrGKGL",
                "source": "Lávese las manos durante 30 segundos con jabón y <em>agua</em>.",
                "target": "Wash your hands for 30 seconds with soap and <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "cDibgYcBLC2PqRpsLLUn",
                "source": "Jim y Steve tenían un tiempo fantástico en el <em>agua</em>.",
                "target": "Jim and Steve had a fantastic time on the <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "vqwQgocBmWHIBpHrG3aT",
                "source": "Tome Cipro con un vaso lleno de <em>agua</em> (8 onzas).",
                "target": "Take Cipro with a full glass of <em>water</em> (8 ounces).",
                "corpus": "paracrawl"
            },
            {
                "id": "hnWDgYcBATG92IlvbLyE",
                "source": "Criar vacas requiere una enorme cantidad de comida y <em>agua</em>.",
                "target": "Raising cows requires an enormous amount of food and <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "cfPmgocBmWHIBpHritPP",
                "source": "Remoje girasol por 4-5 horas en <em>agua</em> y enjuague bien.",
                "target": "Soak sunflower for 4-5 hours in <em>water</em> and rinse well.",
                "corpus": "paracrawl"
            },
            {
                "id": "c4OTgYcBmWHIBpHr_FpX",
                "source": "Los Duraplus también puede sobrevivir en un metro de <em>agua</em>.",
                "target": "The DuraPlus can also survive in a meter of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "VpjqgYcBATG92Ilv2pqa",
                "source": "Senderismo en un entorno privilegiado rodeado de montañas y <em>agua</em>.",
                "target": "Hiking in a privileged environment surrounded by mountains and <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "mJrYgYcBmWHIBpHr7Rki",
                "source": "Cubra los frijoles con 3 veces su cantidad en <em>agua</em>.",
                "target": "Cover the beans with 3 times their amount in <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "8qkFgocBmWHIBpHr4SBV",
                "source": "Las vacas deben pastar en lugares verdes y con <em>agua</em>.",
                "target": "The cows should graze in green places and with <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "X21pgYcBATG92Ilvighi",
                "source": "El <em>agua</em> es muy bonita en las playas de Ko-Larn.",
                "target": "The <em>water</em> is very nice in the beaches of Ko-Larn.",
                "corpus": "paracrawl"
            },
            {
                "id": "8G1rgYcBATG92Ilv6tnH",
                "source": "Tomar 2 cápsulas al día con un vaso de <em>agua</em>.",
                "target": "Take 2 capsules per day with a glass of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "XLhLgocBATG92Ilv4s9y",
                "source": "Una gota y el océano son esencialmente H2O o <em>agua</em>.",
                "target": "A drop and the ocean are essentially H20 or <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "IYvEgYcBATG92IlvPKDN",
                "source": "El posible uso de <em>agua</em> en la escultura es intrigante.",
                "target": "The possible use of <em>water</em> in the sculpture is intriguing.",
                "corpus": "paracrawl"
            },
            {
                "id": "eqLygYcBmWHIBpHre6vH",
                "source": "Elija su preferida y relájese en el <em>agua</em> a 33°C.",
                "target": "Choose your favorite and relax in the <em>water</em> at 33°C.",
                "corpus": "paracrawl"
            },
            {
                "id": "8HWCgYcBATG92IlvHk-1",
                "source": "Enjuague los ojos con una pequeña cantidad de <em>agua</em> tibia.",
                "target": "Rinse the eyes with a small amount of warm <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "Dnt9gYcBmWHIBpHrisWy",
                "source": "Peter y Al tuvieron un día fantástico en el <em>agua</em>.",
                "target": "Peter and Al had a fantastic day on the <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "F-XRgocBATG92IlvoW_d",
                "source": "Dada su alta concentración, emplear F-700 siempre diluido con <em>agua</em>.",
                "target": "Given its high concentration, use F-700 always diluted with <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "IqIHgocBATG92IlvQi1U",
                "source": "Procesar todos los ingredientes y agregar 1 vaso de <em>agua</em>.",
                "target": "Process all the ingredients and add 1 cup of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "jbQ_gocBATG92IlvlrTP",
                "source": "Instrucciones: Vierta 2 tazas de <em>agua</em> fría en la licuadora.",
                "target": "Directions: Pour 2 cups of cold <em>water</em> into the blender.",
                "corpus": "paracrawl"
            },
            {
                "id": "TFwGgocBLC2PqRpstk3F",
                "source": "En una pequeña taza, combinar la linaza y <em>agua</em> tibia.",
                "target": "In a small mug, combine the flaxseed and warm <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "_EXCgYcBLC2PqRpsottX",
                "source": "Coloca la carne y condimentos en una olla con <em>agua</em>.",
                "target": "Put the meat and seasonings in a saucepan with <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "wn6FgYcBmWHIBpHronH0",
                "source": "Miles de personas están sin electricidad, alcantarillado o <em>agua</em> potable.",
                "target": "Thousands of people are without electricity, sewage or drinking <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "h79JgocBmWHIBpHrI0R-",
                "source": "Añadir <em>agua</em> destilada para llenar el resto y agitar bien.",
                "target": "Add distilled <em>water</em> to fill the remainder and shake well.",
                "corpus": "paracrawl"
            },
            {
                "id": "mzWQgYcBLC2PqRps22to",
                "source": "El estudiante piens <em>agua</em>, añadir un vaso a los labios.",
                "target": "The student piens <em>water</em>, add a glass to the lips.",
                "corpus": "paracrawl"
            },
            {
                "id": "UydlgYcBLC2PqRpspSOc",
                "source": "Los comprimidos deben ser disueltos en un vaso de <em>agua</em>.",
                "target": "The tablets should be dissolved in a glass of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "3nJhgYcBmWHIBpHrX3Ys",
                "source": "Se puede llegar allí por dos medios, aire y <em>agua</em>.",
                "target": "You can get there by two means, air and <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "TsFPgocBmWHIBpHrxXKb",
                "source": "Calentar 100ml de <em>agua</em> en el microondas durante 1-2 minutos.",
                "target": "Heat 100ml of <em>water</em> in the microwave for 1-2 minutes.",
                "corpus": "paracrawl"
            },
            {
                "id": "U7VAgocBATG92Ilvkwfa",
                "source": "No utilice <em>agua</em>, jabón o alcohol para retirar la cera.",
                "target": "Don't use <em>water</em>, soap or alcohol to remove the wax.",
                "corpus": "paracrawl"
            },
            {
                "id": "ELUrgocBmWHIBpHrgn7v",
                "source": "Tome Doxycycline con un vaso de <em>agua</em> lleno (8 onzas).",
                "target": "Take Doxycycline with a full glass of <em>water</em> (8 ounces).",
                "corpus": "paracrawl"
            },
            {
                "id": "gKgCgocBmWHIBpHrqiD6",
                "source": "Ingerir un mínimo de 84 onzas de <em>agua</em> cada día.",
                "target": "Ingest a minimum of 84 ounces of <em>water</em> every day.",
                "corpus": "paracrawl"
            },
            {
                "id": "VsZ1gocBATG92Ilv4MtB",
                "source": "Situado en el <em>agua</em> delante, Bungalows proporcionar una excelente vista.",
                "target": "Located on the <em>water</em> front, Bungalows provide an excellent view.",
                "corpus": "paracrawl"
            },
            {
                "id": "G6TfgocBLC2PqRpsb9fm",
                "source": "Base móvil: Se puede llenar con <em>agua</em> o arena (aprox.",
                "target": "Mobile base: Can be filled with <em>water</em> or sand (approx.",
                "corpus": "paracrawl"
            },
            {
                "id": "trUsgocBmWHIBpHrgt-z",
                "source": "Un lugar tranquilo con posibilidad de <em>agua</em> grifos y descarga.",
                "target": "A quiet place with possibility of <em>water</em> taps and discharge.",
                "corpus": "paracrawl"
            },
            {
                "id": "0TK5gIcBATG92Ilvzp1Q",
                "source": "Marko parece un tiburón con sangre en el <em>agua</em>.",
                "target": "Marko looks like a shark with blood in the <em>water</em>.",
                "corpus": "open-subtitles"
            },
            {
                "id": "sn2CgYcBmWHIBpHrR01B",
                "source": "Otro desafío fue la presencia de <em>agua</em> en la roca.",
                "target": "Another challenge was the presence of <em>water</em> in the rock.",
                "corpus": "paracrawl"
            },
            {
                "id": "_m5sgYcBATG92IlvwB99",
                "source": "La región es muy fértil y bien provisto de <em>agua</em>.",
                "target": "The region is very fertile and well supplied with <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "bKXigocBLC2PqRpsufTg",
                "source": "Mezclar 2 dosificadores con 300-400ml de <em>agua</em> fría por servicio.",
                "target": "Mix 2 scoops with 300-400ml of cold <em>water</em> per service.",
                "corpus": "paracrawl"
            },
            {
                "id": "YTq5gIcBmWHIBpHrz6uR",
                "source": "Matty era bueno en todo, pero le encantaba el <em>agua</em>.",
                "target": "Matty was good at everything, but he loved the <em>water</em>.",
                "corpus": "open-subtitles"
            },
            {
                "id": "p6MMgocBATG92IlvJL5O",
                "source": "Lake Louise es un cuerpo largo y estrecho de <em>agua</em>.",
                "target": "Lake Louise is a long and narrow body of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "OIikgYcBmWHIBpHrx8hz",
                "source": "Tal vez, podría ser un filtro de aire o <em>agua</em>.",
                "target": "Perhaps, it could be a filter for air or <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "r5rvgYcBATG92IlvlyJt",
                "source": "Su calor y <em>agua</em> ya están incluidos en las evaluaciones.",
                "target": "Your heat and <em>water</em> are already included in the assessments.",
                "corpus": "paracrawl"
            },
            {
                "id": "S5PbgYcBATG92Ilv95a_",
                "source": "En hojas de Creta Mnemosyne <em>agua</em> está a la derecha.",
                "target": "In sheets from Crete Mnemosyne <em>water</em> is to the right.",
                "corpus": "paracrawl"
            },
            {
                "id": "TnVsgYcBmWHIBpHrXvwX",
                "source": "Lava el arroz glutinoso en <em>agua</em> dos o tres veces.",
                "target": "Wash the sticky rice in <em>water</em> two or three times.",
                "corpus": "paracrawl"
            },
            {
                "id": "Eo7LgYcBATG92IlvtCHO",
                "source": "La solución de ácido sulfúrico y <em>agua</em> se denomina electrolito.",
                "target": "The solution of sulfuric acid and <em>water</em> is called electrolyte.",
                "corpus": "paracrawl"
            },
            {
                "id": "D2Y8gYcBmWHIBpHrHwkO",
                "source": "Hay comida y <em>agua</em> en el fondo de la cueva.",
                "target": "There's food and <em>water</em> in the back of the cave.",
                "corpus": "open-subtitles"
            },
            {
                "id": "jPkNg4cBATG92IlvLAx6",
                "source": "Local comercial en ponferrada zona la cemba. extras: <em>agua</em>, luz.",
                "target": "Commercial venue in ponferrada zone the cemba. extras: <em>water</em>, light.",
                "corpus": "paracrawl"
            },
            {
                "id": "1G9vgYcBATG92IlvpxEZ",
                "source": "Acceso y control de la tierra, <em>agua</em> y recursos naturales.",
                "target": "Access and control of the land, <em>water</em> and natural resources.",
                "corpus": "paracrawl"
            },
            {
                "id": "5fXQgIcBLC2PqRps3WWK",
                "source": "No hay nada malo con el <em>agua</em> en este edificio.",
                "target": "There is nothing wrong with the <em>water</em> in this building.",
                "corpus": "open-subtitles"
            },
            {
                "id": "gro6gocBmWHIBpHrtXwL",
                "source": "Mantén el producto durante 5-10 minutos y aclara con <em>agua</em>.",
                "target": "Keep the product for 5-10 minutes and rinse with <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "pq8xgocBATG92IlvePxs",
                "source": "Juan también explica su bautismo es un bautismo de <em>agua</em>.",
                "target": "John also explains his baptism is a baptism of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "XoWZgYcBmWHIBpHrUxzy",
                "source": "El <em>agua</em> está ahí, reservada en los mares y océanos.",
                "target": "The <em>water</em> is there, reserved in the seas and oceans.",
                "corpus": "paracrawl"
            },
            {
                "id": "Ru3TgocBmWHIBpHrEzw0",
                "source": "Añadir un volumen de <em>agua</em> desionizada con suficiente agitación; 3.",
                "target": "Add a volume of deionized <em>water</em> with sufficiently stirred; 3.",
                "corpus": "paracrawl"
            },
            {
                "id": "B3BbgYcBmWHIBpHrunlV",
                "source": "En verano, lleva un sombrero y una botella de <em>agua</em>.",
                "target": "In summer, bring a hat and a bottle of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "8CtzgYcBLC2PqRpssddP",
                "source": "Siempre tome estos medicamentos con un vaso lleno de <em>agua</em>.",
                "target": "Always take these medications with a full glass of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "PJDRgYcBATG92IlvqiO8",
                "source": "Si la mezcla parece seca, añadir una cucharada de <em>agua</em>.",
                "target": "If the mixture seems dry add a tablespoon of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "M2hbgYcBATG92IlvL0aW",
                "source": "Sin embargo, la calidad de <em>agua</em> potable es muy importante.",
                "target": "However, the quality of drinking <em>water</em> is very important.",
                "corpus": "paracrawl"
            },
            {
                "id": "4YusgYcBmWHIBpHrGzW6",
                "source": "Y es muy legible en el <em>agua</em> con buena precisión.",
                "target": "And it is very legible in the <em>water</em> with good precision.",
                "corpus": "paracrawl"
            },
            {
                "id": "r4KpgYcBATG92IlvqbxF",
                "source": "Tiene cuatro ruinas y un molino de <em>agua</em> para recuperar.",
                "target": "It has four ruins and a <em>water</em> mill to recover.",
                "corpus": "paracrawl"
            },
            {
                "id": "ty9_gYcBLC2PqRpsTaw8",
                "source": "La botella debe ser pre-enjuagar con <em>agua</em> de la misma.",
                "target": "The bottle should be pre-rinsed with <em>water</em> of the same.",
                "corpus": "paracrawl"
            },
            {
                "id": "Hn6GgYcBmWHIBpHryNQX",
                "source": "Mojar con un buen vino tinto y muy poca <em>agua</em>.",
                "target": "Moisten with a good red wine and very little <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "EnaGgYcBATG92Ilvi8m7",
                "source": "Llena la piña con al menos una pulgada de <em>agua</em>.",
                "target": "Fill the pineapple with at least an inch of <em>water</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "_DnPgIcBATG92Ilvy_Rv",
                "source": "El <em>agua</em> es el mejor lugar para esconder un diamante.",
                "target": "The <em>water</em> is the best place to hide a diamond.",
                "corpus": "open-subtitles"
            },
            {
                "id": "5IyugYcBmWHIBpHrtgrC",
                "source": "Los ingredientes se mezclan con <em>agua</em> para formar una pasta.",
                "target": "The ingredients are mixed with <em>water</em> to form a paste.",
                "corpus": "paracrawl"
            },
            {
                "id": "B3BAgocBLC2PqRpsvS8R",
                "source": "Reacciona lentamente con el <em>agua</em> y es soluble en ácidos.",
                "target": "It reacts slowly with <em>water</em> and is soluble in acids.",
                "corpus": "paracrawl"
            },
            {
                "id": "sZG_gYcBmWHIBpHr88zi",
                "source": "Es este movimiento de <em>agua</em> el que crea la instalación.",
                "target": "It is this movement of <em>water</em> that creates the installation.",
                "corpus": "paracrawl"
            },
            {
                "id": "Rrk2gocBmWHIBpHrMgc9",
                "source": "Los aeropuertos de Yakarta y Singapur estarán bajo el <em>agua</em>.",
                "target": "The airports of Jakarta and Singapore will be under <em>water</em>.",
                "corpus": "paracrawl"
            }
        ],
        "translations": [
            {
                "translation": "water",
                "count": 250432,
                "isFacet": true
            }
        ]
    }
}

那如果https://www.spanishdict.com/examples/pan?lang=es&page=12呢?页数的参数是什么?怎么看?

https://examples1.spanishdict.com/explore?lang=es&q=pan&numExplorationSentences=100

是不是要改成

https://examples1.spanishdict.com/explore?lang=es&q=pan&numExplorationSentences=100&&page=12

就能获取?但是我验证过了,没有变化

分页请求:

https://examples1.spanishdict.com/sentences?lang=es&q=pan&page=6&pageSize=20
{
    "params": {
        "q": "pan",
        "lang": "es",
        "page": 6,
        "pageSize": 20
    },
    "data": {
        "totalHits": 10000,
        "sentences": [
            {
                "id": "3Du7gIcBmWHIBpHrfC4e",
                "source": "Mi vida se trata de poner <em>pan</em> sobre la mesa.",
                "target": "My life is about thisputting <em>bread</em> on the table.",
                "corpus": "open-subtitles"
            },
            {
                "id": "8KDrgYcBmWHIBpHrPjiX",
                "source": "En café del pueblo donde usted puede conseguir <em>pan</em> y bebida.",
                "target": "In village cafe where you can get <em>bread</em> and drink.",
                "corpus": "paracrawl"
            },
            {
                "id": "SjOKgYcBLC2PqRpsXFDR",
                "source": "El <em>pan</em> y la copa son dos símbolos separados.",
                "target": "The <em>bread</em> and the cup are two separate symbols.",
                "corpus": "paracrawl"
            },
            {
                "id": "YJXLgYcBmWHIBpHrtq2i",
                "source": "Sonrisa gustosa como la harina y buena como el <em>pan</em>.",
                "target": "Pleasant smile as the flour and good as the <em>bread</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "HyVigYcBLC2PqRpsIvHb",
                "source": "Les dice que Él es el <em>pan</em> de vida.",
                "target": "He says that He is the <em>bread</em> of life.",
                "corpus": "paracrawl"
            },
            {
                "id": "KHtjgocBLC2PqRpsFtMV",
                "source": "Pronuncia sobre el <em>pan</em> la oración de alabanza y bendición.",
                "target": "He recites over the <em>bread</em> the prayer of praise and blessing.",
                "corpus": "paracrawl"
            },
            {
                "id": "jMyGgocBATG92Ilv5YtZ",
                "source": "Carolina y Renée durante la celebración del compartir del <em>pan</em>.",
                "target": "Carolina and Renée during the celebration for the <em>bread</em> sharing.",
                "corpus": "paracrawl"
            },
            {
                "id": "I2cmgocBLC2PqRps7V2M",
                "source": "Si el <em>pan</em> se seca, agregar el agua en pequeños incrementos.",
                "target": "If the <em>pan</em> gets dry, add water in small increments.",
                "corpus": "paracrawl"
            },
            {
                "id": "i3p5gYcBmWHIBpHrVmCa",
                "source": "Se comió con el <em>pan</em> como una comida completa.",
                "target": "He ate with the <em>bread</em> as a full meal.",
                "corpus": "paracrawl"
            },
            {
                "id": "TKLZgocBLC2PqRpstvqT",
                "source": "Buen <em>pan</em> se hace con levadura, preferiblemente un bio cereales orgánicos.",
                "target": "Good <em>bread</em> is made with yeast, preferably a bio organic cereals.",
                "corpus": "paracrawl"
            },
            {
                "id": "KTibgYcBLC2PqRpsLLYn",
                "source": "Por lo tanto, debemos tomar este mensaje como nuestro <em>pan</em> espiritual.",
                "target": "Therefore, we must take these messages as our spiritual <em>bread</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "8WIagocBLC2PqRpsL_cW",
                "source": "Es justo hoy que bendecimos, descanso y compartir el <em>pan</em>.",
                "target": "It is fitting today that we bless, break and share <em>bread</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "HnhzgYcBmWHIBpHrjHmJ",
                "source": "Nos sentamos en la misma mesa, com-partimos el mismo <em>pan</em>.",
                "target": "We sat around the same table, shared the same <em>bread</em>.",
                "corpus": "paracrawl"
            },
            {
                "id": "NHVpgYcBmWHIBpHr7T_5",
                "source": "Si lo desea, puede freír el <em>pan</em> en la tostadora.",
                "target": "If you like, you can fry the <em>bread</em> in the toaster.",
                "corpus": "paracrawl"
            },
            {
                "id": "Kj-ugYcBLC2PqRpsxkZY",
                "source": "Todos deben ganar su <em>pan</em> diario sobre la tierra.",
                "target": "All must earn their daily <em>bread</em> upon the earth.",
                "corpus": "paracrawl"
            },
            {
                "id": "eZXigYcBATG92IlvYbb0",
                "source": "Normalmente es de autoservicio e incluye <em>pan</em>, café y cereales.",
                "target": "It is usually self-service and includes <em>bread</em>, coffee, cereal.",
                "corpus": "paracrawl"
            },
            {
                "id": "3XeJgYcBATG92Ilvrdmr",
                "source": "El <em>pan</em> del cielo es su carne y su sangre.",
                "target": "The <em>bread</em> from heaven is his flesh and blood.",
                "corpus": "paracrawl"
            },
            {
                "id": "XQIpg4cBATG92IlvBV36",
                "source": "Y ese gorjeo que haces cuando quieres más <em>pan</em>.",
                "target": "And that chirp you do when you want more <em>bread</em>.",
                "corpus": "subtitulos"
            },
            {
                "id": "DHN7gYcBATG92IlvyCYF",
                "source": "La bebida y el <em>pan</em> también están incluidos en el precio.",
                "target": "The drink and <em>bread</em> are also included in the price.",
                "corpus": "paracrawl"
            },
            {
                "id": "NbAygocBATG92IlvFTbN",
                "source": "Es más galletas picadas y corte el <em>pan</em> en rebanadas.",
                "target": "It is further chopped biscuits and <em>bread</em> cut into slices.",
                "corpus": "paracrawl"
            }
        ]
    }
}
1 Like

嗯,可以,那么如何抓取图片和音频呢?

提取出所有链接,保存到文本文件里,然后复制到任一支持批量下载的工具里就可以了,比如迅雷IDM之类的。链接很多的话,推荐 aria2 命令行批量下载,可配置项多,比自己写代码方便。


刚爬的时候就出现403,说明爬虫被识别吗?

这三级索引怎么抓取wordlists?

把网址复制到浏览器可以正常访问,就没被墙,可能是被识别为爬虫。

我刚试了,网站正常访问,就是爬不了。很神奇……
网站:

可以尝试下 Playwright,这个更复杂一些,Github 上还有很多反反爬虫的插件可以帮助 Playwright 避免被识别为爬虫,或者跳过验证码。