如果删除不要的元素,代码量很大,不一定保证删除正确。有什么比较好的解决方法?
只能多加小心,没有好办法。把原文件备份下,防止删错。
你首先需要学习requests和bs4两个库,其次需要了解如何使用代理和多线程。下面是爬取朗文英西字典的简单示例代码。真要爬取,你还要做很多工作的。
import requests
from bs4 import BeautifulSoup as bs
with open(‘cookie.txt’, encoding=‘utf-8-sig’) as f: cookie=f.read()
headers={‘cookie’:cookie,‘User-Agent’:‘Edge/120.0.0.0’}
r=requests.get(‘A (REAL) TROOPER - Spanish translation - Longman’, headers=headers)
soup=bs(r.text, ‘lxml’)
data=soup.select_one(‘div[class=dictionary]’)
print(data.get_text())
see more 的数据保存在名为 SD_COMPONENT_DATA
的脚本字符串中,可以提取出这个字符串转成JSON,然后自己拼接出对应的HTML片段,拼接使用前面#28楼提到的jinja2。
这个html源代码怎么获取的json?我发现抓取之后example都不见了,而且存储到本地没有成功
这个html源代码怎么获取的json?我发现抓取之后examples都不见了,而且存储到本地没有成功。
我刚看了一下,这里面json没有找到see more相关的。html源代码里没有找到see more里的内容,而且本地能点击展开。。。这比我想象还要复杂啊
see more在html源代码里有,你已经下载成功了。example的数据,需要二次请求:
https://examples1.spanishdict.com/explore?lang=es&q=agua&numExplorationSentences=100
返回:
{
"params": {
"q": "agua",
"lang": "es",
"numExplorationSentences": 100
},
"data": {
"totalHits": 10000,
"sentences": [
{
"id": "IbUrgocBmWHIBpHrg4Ki",
"source": "Puro o diluido en un vaso de <em>agua</em> (200 ml).",
"target": "Pure or diluted in a glass of <em>water</em> (200 ml).",
"corpus": "paracrawl"
},
{
"id": "AChqgYcBLC2PqRpsZL8H",
"source": "En este lugar, la ausencia de <em>agua</em> es casi absoluta.",
"target": "In this place, the absence of <em>water</em> is almost absolute.",
"corpus": "paracrawl"
},
{
"id": "xpTIgYcBmWHIBpHrbKd3",
"source": "Aquí en San Antonio el <em>agua</em> es suave y limpia.",
"target": "Here in San Antonio the <em>water</em> is soft and clean.",
"corpus": "paracrawl"
},
{
"id": "t3p6gYcBmWHIBpHraLEz",
"source": "El <em>agua</em> es increíblemente transparente y turquesa en esta zona.",
"target": "The <em>water</em> is incredibly transparent and turquoise in this area.",
"corpus": "paracrawl"
},
{
"id": "FC2QgIcBmWHIBpHrvxBI",
"source": "Y un tiburón sabe cuando hay sangre en el <em>agua</em>.",
"target": "And a shark knows when there's blood in the <em>water</em>.",
"corpus": "open-subtitles"
},
{
"id": "qZChgocBLC2PqRps6oqA",
"source": "Ambos isómeros son insolubles en <em>agua</em>, pero miscibles con éter.",
"target": "Both isomers are insoluble in <em>water</em>, but miscible with ether.",
"corpus": "paracrawl"
},
{
"id": "epSugocBLC2PqRpsl9Ds",
"source": "Deshidratación (pérdida de <em>agua</em> y otros líquidos en el cuerpo)",
"target": "Dehydration (loss of <em>water</em> and other fluids in the body)",
"corpus": "paracrawl"
},
{
"id": "7bUSg4cBLC2PqRpsM96q",
"source": "Aplicación foliar: 1,5 ml/L de <em>agua</em> en tratamientos con vegetación.",
"target": "Foliar application: 1,5 ml/L of <em>water</em> in treatments with vegetation.",
"corpus": "paracrawl"
},
{
"id": "0CRdgYcBLC2PqRps7520",
"source": "Mantenga una botella de <em>agua</em> con usted durante el día.",
"target": "Keep a bottle of <em>water</em> with you during the day.",
"corpus": "paracrawl"
},
{
"id": "5X-KgYcBmWHIBpHrNvf6",
"source": "Lave el dedo con jabón y <em>agua</em> durante 5 minutos.",
"target": "Wash the finger with soap and <em>water</em> for 5 minutes.",
"corpus": "paracrawl"
},
{
"id": "O4mmgYcBmWHIBpHrs3gw",
"source": "Lave la herida con jabón y <em>agua</em> durante 5 minutos.",
"target": "Wash the wound with soap and <em>water</em> for 5 minutes.",
"corpus": "paracrawl"
},
{
"id": "vPwAg4cBmWHIBpHr4JST",
"source": "Calidad y acceso a <em>agua</em> potable en Tenerife alta (62)",
"target": "Quality and access to drinking <em>water</em> in Tenerife high (62)",
"corpus": "paracrawl"
},
{
"id": "DqTfgocBLC2PqRps5vc1",
"source": "Dosificación: usar 5 ml por cada 80 litros de <em>agua</em>.",
"target": "Dosage: using 5 ml for each 80 litres of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "VLIIg4cBLC2PqRpsW4kJ",
"source": "Calidad y acceso a <em>agua</em> potable en Varadero moderada (56)",
"target": "Quality and access to drinking <em>water</em> in Varadero moderate (56)",
"corpus": "paracrawl"
},
{
"id": "Gk3ZgYcBLC2PqRps_qi0",
"source": "Todos los asentamientos están equipados con electricidad, <em>agua</em> y gas.",
"target": "All the settlements are equipped with electricity, <em>water</em> and gas.",
"corpus": "paracrawl"
},
{
"id": "4EHNgIcBmWHIBpHrWj5V",
"source": "El mejor lugar para conservar <em>agua</em> es en tu cuerpo.",
"target": "The best place to conserve <em>water</em> is in your body.",
"corpus": "open-subtitles"
},
{
"id": "mRxFgYcBLC2PqRpslnXK",
"source": "Bueno, ha estado en el <em>agua</em> un par de días.",
"target": "Well, it's been in the <em>water</em> a couple of days.",
"corpus": "open-subtitles"
},
{
"id": "2-CugocBmWHIBpHrlOFl",
"source": "Mezclar con 4-8 onzas de <em>agua</em> o su bebida favorita.",
"target": "Mix with 4-8 ounces of <em>water</em> or your favorite beverage.",
"corpus": "paracrawl"
},
{
"id": "b4yvgYcBmWHIBpHroFxz",
"source": "Añadir a mi guía Aquavox Una experiencia en el <em>agua</em>.",
"target": "Add to my guide Aquavox An experience in the <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "AF4OgocBLC2PqRpsR-va",
"source": "Añadir 1 taza de <em>agua</em> a la sartén y revuelva.",
"target": "Add 1 cup of <em>water</em> to the pan and stir.",
"corpus": "paracrawl"
},
{
"id": "XHaGgYcBATG92Ilv298c",
"source": "Un apartamento en el <em>agua</em> ofrece todo esto y más.",
"target": "An apartment on the <em>water</em> offers all this and more.",
"corpus": "paracrawl"
},
{
"id": "0WldgYcBATG92IlvjQd2",
"source": "Tome esta medicina por boca con un vaso de <em>agua</em>.",
"target": "Take this medicine by mouth with a glass of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "SHBcgYcBmWHIBpHro8q9",
"source": "Tenga una botella de <em>agua</em> con usted durante el día.",
"target": "Keep a bottle of <em>water</em> with you during the day.",
"corpus": "paracrawl"
},
{
"id": "1C16gYcBLC2PqRpsGu2W",
"source": "Es la cantidad de carbonatos y bicarbonatos en el <em>agua</em>.",
"target": "Is the amount of carbonates and bicarbonates in the <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "1nBbgYcBmWHIBpHrzoFz",
"source": "Tome este medicamento por boca con un vaso de <em>agua</em>.",
"target": "Take this medicine by mouth with a glass of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "okjKgYcBLC2PqRpsLls9",
"source": "Luz y <em>agua</em> conectados a la red general, Aire acondicionado.",
"target": "Light and <em>water</em> connected to the general network, Air conditioning.",
"corpus": "paracrawl"
},
{
"id": "j4WbgYcBmWHIBpHrd9XI",
"source": "Lave el dedo con jabón y <em>agua</em> durante 5 minutos.",
"target": "Wash the toe with soap and <em>water</em> for 5 minutes.",
"corpus": "paracrawl"
},
{
"id": "YZXLgYcBmWHIBpHrVpM9",
"source": "En 2 litros de <em>agua</em> a hervir una cebolla entera.",
"target": "In 2 liters of <em>water</em> to boil a whole onion.",
"corpus": "paracrawl"
},
{
"id": "JXR_gYcBATG92IlvOVXB",
"source": "Tan sencillo, perfecto y delicado como una gota de <em>agua</em>.",
"target": "So simple, perfect and delicate like a drop of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "96z3gocBLC2PqRpsQNY5",
"source": "Escuchar Mahjong - Castillo de <em>agua</em> juegos relacionados y actualizaciones.",
"target": "Play Mahjong - Castle on <em>water</em> related games and updates.",
"corpus": "paracrawl"
},
{
"id": "5pKogocBLC2PqRpseb4M",
"source": "Sumerja el parche en <em>agua</em> de 2 a 6 segundos.",
"target": "Dip the patch in <em>water</em> for 2 to 6 seconds.",
"corpus": "paracrawl"
},
{
"id": "xipvgYcBLC2PqRpsuYhW",
"source": "En verano, lleva protección solar y una botella de <em>agua</em>.",
"target": "In summer, bring solar protection and a bottle of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "vawQgocBmWHIBpHrG3aT",
"source": "Tome Noroxin con un vaso lleno de <em>agua</em> (8 onzas).",
"target": "Take Noroxin with a full glass of <em>water</em> (8 ounces).",
"corpus": "paracrawl"
},
{
"id": "gIyVgocBLC2PqRpss4Ok",
"source": "Inmediatamente antes del procedimiento, enjuague su boca con <em>agua</em> hervida.",
"target": "Immediately before the procedure, rinse your mouth with boiled <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "aXRogYcBmWHIBpHrGKGL",
"source": "Lávese las manos durante 30 segundos con jabón y <em>agua</em>.",
"target": "Wash your hands for 30 seconds with soap and <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "cDibgYcBLC2PqRpsLLUn",
"source": "Jim y Steve tenían un tiempo fantástico en el <em>agua</em>.",
"target": "Jim and Steve had a fantastic time on the <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "vqwQgocBmWHIBpHrG3aT",
"source": "Tome Cipro con un vaso lleno de <em>agua</em> (8 onzas).",
"target": "Take Cipro with a full glass of <em>water</em> (8 ounces).",
"corpus": "paracrawl"
},
{
"id": "hnWDgYcBATG92IlvbLyE",
"source": "Criar vacas requiere una enorme cantidad de comida y <em>agua</em>.",
"target": "Raising cows requires an enormous amount of food and <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "cfPmgocBmWHIBpHritPP",
"source": "Remoje girasol por 4-5 horas en <em>agua</em> y enjuague bien.",
"target": "Soak sunflower for 4-5 hours in <em>water</em> and rinse well.",
"corpus": "paracrawl"
},
{
"id": "c4OTgYcBmWHIBpHr_FpX",
"source": "Los Duraplus también puede sobrevivir en un metro de <em>agua</em>.",
"target": "The DuraPlus can also survive in a meter of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "VpjqgYcBATG92Ilv2pqa",
"source": "Senderismo en un entorno privilegiado rodeado de montañas y <em>agua</em>.",
"target": "Hiking in a privileged environment surrounded by mountains and <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "mJrYgYcBmWHIBpHr7Rki",
"source": "Cubra los frijoles con 3 veces su cantidad en <em>agua</em>.",
"target": "Cover the beans with 3 times their amount in <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "8qkFgocBmWHIBpHr4SBV",
"source": "Las vacas deben pastar en lugares verdes y con <em>agua</em>.",
"target": "The cows should graze in green places and with <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "X21pgYcBATG92Ilvighi",
"source": "El <em>agua</em> es muy bonita en las playas de Ko-Larn.",
"target": "The <em>water</em> is very nice in the beaches of Ko-Larn.",
"corpus": "paracrawl"
},
{
"id": "8G1rgYcBATG92Ilv6tnH",
"source": "Tomar 2 cápsulas al día con un vaso de <em>agua</em>.",
"target": "Take 2 capsules per day with a glass of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "XLhLgocBATG92Ilv4s9y",
"source": "Una gota y el océano son esencialmente H2O o <em>agua</em>.",
"target": "A drop and the ocean are essentially H20 or <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "IYvEgYcBATG92IlvPKDN",
"source": "El posible uso de <em>agua</em> en la escultura es intrigante.",
"target": "The possible use of <em>water</em> in the sculpture is intriguing.",
"corpus": "paracrawl"
},
{
"id": "eqLygYcBmWHIBpHre6vH",
"source": "Elija su preferida y relájese en el <em>agua</em> a 33°C.",
"target": "Choose your favorite and relax in the <em>water</em> at 33°C.",
"corpus": "paracrawl"
},
{
"id": "8HWCgYcBATG92IlvHk-1",
"source": "Enjuague los ojos con una pequeña cantidad de <em>agua</em> tibia.",
"target": "Rinse the eyes with a small amount of warm <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "Dnt9gYcBmWHIBpHrisWy",
"source": "Peter y Al tuvieron un día fantástico en el <em>agua</em>.",
"target": "Peter and Al had a fantastic day on the <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "F-XRgocBATG92IlvoW_d",
"source": "Dada su alta concentración, emplear F-700 siempre diluido con <em>agua</em>.",
"target": "Given its high concentration, use F-700 always diluted with <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "IqIHgocBATG92IlvQi1U",
"source": "Procesar todos los ingredientes y agregar 1 vaso de <em>agua</em>.",
"target": "Process all the ingredients and add 1 cup of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "jbQ_gocBATG92IlvlrTP",
"source": "Instrucciones: Vierta 2 tazas de <em>agua</em> fría en la licuadora.",
"target": "Directions: Pour 2 cups of cold <em>water</em> into the blender.",
"corpus": "paracrawl"
},
{
"id": "TFwGgocBLC2PqRpstk3F",
"source": "En una pequeña taza, combinar la linaza y <em>agua</em> tibia.",
"target": "In a small mug, combine the flaxseed and warm <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "_EXCgYcBLC2PqRpsottX",
"source": "Coloca la carne y condimentos en una olla con <em>agua</em>.",
"target": "Put the meat and seasonings in a saucepan with <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "wn6FgYcBmWHIBpHronH0",
"source": "Miles de personas están sin electricidad, alcantarillado o <em>agua</em> potable.",
"target": "Thousands of people are without electricity, sewage or drinking <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "h79JgocBmWHIBpHrI0R-",
"source": "Añadir <em>agua</em> destilada para llenar el resto y agitar bien.",
"target": "Add distilled <em>water</em> to fill the remainder and shake well.",
"corpus": "paracrawl"
},
{
"id": "mzWQgYcBLC2PqRps22to",
"source": "El estudiante piens <em>agua</em>, añadir un vaso a los labios.",
"target": "The student piens <em>water</em>, add a glass to the lips.",
"corpus": "paracrawl"
},
{
"id": "UydlgYcBLC2PqRpspSOc",
"source": "Los comprimidos deben ser disueltos en un vaso de <em>agua</em>.",
"target": "The tablets should be dissolved in a glass of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "3nJhgYcBmWHIBpHrX3Ys",
"source": "Se puede llegar allí por dos medios, aire y <em>agua</em>.",
"target": "You can get there by two means, air and <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "TsFPgocBmWHIBpHrxXKb",
"source": "Calentar 100ml de <em>agua</em> en el microondas durante 1-2 minutos.",
"target": "Heat 100ml of <em>water</em> in the microwave for 1-2 minutes.",
"corpus": "paracrawl"
},
{
"id": "U7VAgocBATG92Ilvkwfa",
"source": "No utilice <em>agua</em>, jabón o alcohol para retirar la cera.",
"target": "Don't use <em>water</em>, soap or alcohol to remove the wax.",
"corpus": "paracrawl"
},
{
"id": "ELUrgocBmWHIBpHrgn7v",
"source": "Tome Doxycycline con un vaso de <em>agua</em> lleno (8 onzas).",
"target": "Take Doxycycline with a full glass of <em>water</em> (8 ounces).",
"corpus": "paracrawl"
},
{
"id": "gKgCgocBmWHIBpHrqiD6",
"source": "Ingerir un mínimo de 84 onzas de <em>agua</em> cada día.",
"target": "Ingest a minimum of 84 ounces of <em>water</em> every day.",
"corpus": "paracrawl"
},
{
"id": "VsZ1gocBATG92Ilv4MtB",
"source": "Situado en el <em>agua</em> delante, Bungalows proporcionar una excelente vista.",
"target": "Located on the <em>water</em> front, Bungalows provide an excellent view.",
"corpus": "paracrawl"
},
{
"id": "G6TfgocBLC2PqRpsb9fm",
"source": "Base móvil: Se puede llenar con <em>agua</em> o arena (aprox.",
"target": "Mobile base: Can be filled with <em>water</em> or sand (approx.",
"corpus": "paracrawl"
},
{
"id": "trUsgocBmWHIBpHrgt-z",
"source": "Un lugar tranquilo con posibilidad de <em>agua</em> grifos y descarga.",
"target": "A quiet place with possibility of <em>water</em> taps and discharge.",
"corpus": "paracrawl"
},
{
"id": "0TK5gIcBATG92Ilvzp1Q",
"source": "Marko parece un tiburón con sangre en el <em>agua</em>.",
"target": "Marko looks like a shark with blood in the <em>water</em>.",
"corpus": "open-subtitles"
},
{
"id": "sn2CgYcBmWHIBpHrR01B",
"source": "Otro desafío fue la presencia de <em>agua</em> en la roca.",
"target": "Another challenge was the presence of <em>water</em> in the rock.",
"corpus": "paracrawl"
},
{
"id": "_m5sgYcBATG92IlvwB99",
"source": "La región es muy fértil y bien provisto de <em>agua</em>.",
"target": "The region is very fertile and well supplied with <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "bKXigocBLC2PqRpsufTg",
"source": "Mezclar 2 dosificadores con 300-400ml de <em>agua</em> fría por servicio.",
"target": "Mix 2 scoops with 300-400ml of cold <em>water</em> per service.",
"corpus": "paracrawl"
},
{
"id": "YTq5gIcBmWHIBpHrz6uR",
"source": "Matty era bueno en todo, pero le encantaba el <em>agua</em>.",
"target": "Matty was good at everything, but he loved the <em>water</em>.",
"corpus": "open-subtitles"
},
{
"id": "p6MMgocBATG92IlvJL5O",
"source": "Lake Louise es un cuerpo largo y estrecho de <em>agua</em>.",
"target": "Lake Louise is a long and narrow body of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "OIikgYcBmWHIBpHrx8hz",
"source": "Tal vez, podría ser un filtro de aire o <em>agua</em>.",
"target": "Perhaps, it could be a filter for air or <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "r5rvgYcBATG92IlvlyJt",
"source": "Su calor y <em>agua</em> ya están incluidos en las evaluaciones.",
"target": "Your heat and <em>water</em> are already included in the assessments.",
"corpus": "paracrawl"
},
{
"id": "S5PbgYcBATG92Ilv95a_",
"source": "En hojas de Creta Mnemosyne <em>agua</em> está a la derecha.",
"target": "In sheets from Crete Mnemosyne <em>water</em> is to the right.",
"corpus": "paracrawl"
},
{
"id": "TnVsgYcBmWHIBpHrXvwX",
"source": "Lava el arroz glutinoso en <em>agua</em> dos o tres veces.",
"target": "Wash the sticky rice in <em>water</em> two or three times.",
"corpus": "paracrawl"
},
{
"id": "Eo7LgYcBATG92IlvtCHO",
"source": "La solución de ácido sulfúrico y <em>agua</em> se denomina electrolito.",
"target": "The solution of sulfuric acid and <em>water</em> is called electrolyte.",
"corpus": "paracrawl"
},
{
"id": "D2Y8gYcBmWHIBpHrHwkO",
"source": "Hay comida y <em>agua</em> en el fondo de la cueva.",
"target": "There's food and <em>water</em> in the back of the cave.",
"corpus": "open-subtitles"
},
{
"id": "jPkNg4cBATG92IlvLAx6",
"source": "Local comercial en ponferrada zona la cemba. extras: <em>agua</em>, luz.",
"target": "Commercial venue in ponferrada zone the cemba. extras: <em>water</em>, light.",
"corpus": "paracrawl"
},
{
"id": "1G9vgYcBATG92IlvpxEZ",
"source": "Acceso y control de la tierra, <em>agua</em> y recursos naturales.",
"target": "Access and control of the land, <em>water</em> and natural resources.",
"corpus": "paracrawl"
},
{
"id": "5fXQgIcBLC2PqRps3WWK",
"source": "No hay nada malo con el <em>agua</em> en este edificio.",
"target": "There is nothing wrong with the <em>water</em> in this building.",
"corpus": "open-subtitles"
},
{
"id": "gro6gocBmWHIBpHrtXwL",
"source": "Mantén el producto durante 5-10 minutos y aclara con <em>agua</em>.",
"target": "Keep the product for 5-10 minutes and rinse with <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "pq8xgocBATG92IlvePxs",
"source": "Juan también explica su bautismo es un bautismo de <em>agua</em>.",
"target": "John also explains his baptism is a baptism of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "XoWZgYcBmWHIBpHrUxzy",
"source": "El <em>agua</em> está ahí, reservada en los mares y océanos.",
"target": "The <em>water</em> is there, reserved in the seas and oceans.",
"corpus": "paracrawl"
},
{
"id": "Ru3TgocBmWHIBpHrEzw0",
"source": "Añadir un volumen de <em>agua</em> desionizada con suficiente agitación; 3.",
"target": "Add a volume of deionized <em>water</em> with sufficiently stirred; 3.",
"corpus": "paracrawl"
},
{
"id": "B3BbgYcBmWHIBpHrunlV",
"source": "En verano, lleva un sombrero y una botella de <em>agua</em>.",
"target": "In summer, bring a hat and a bottle of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "8CtzgYcBLC2PqRpssddP",
"source": "Siempre tome estos medicamentos con un vaso lleno de <em>agua</em>.",
"target": "Always take these medications with a full glass of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "PJDRgYcBATG92IlvqiO8",
"source": "Si la mezcla parece seca, añadir una cucharada de <em>agua</em>.",
"target": "If the mixture seems dry add a tablespoon of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "M2hbgYcBATG92IlvL0aW",
"source": "Sin embargo, la calidad de <em>agua</em> potable es muy importante.",
"target": "However, the quality of drinking <em>water</em> is very important.",
"corpus": "paracrawl"
},
{
"id": "4YusgYcBmWHIBpHrGzW6",
"source": "Y es muy legible en el <em>agua</em> con buena precisión.",
"target": "And it is very legible in the <em>water</em> with good precision.",
"corpus": "paracrawl"
},
{
"id": "r4KpgYcBATG92IlvqbxF",
"source": "Tiene cuatro ruinas y un molino de <em>agua</em> para recuperar.",
"target": "It has four ruins and a <em>water</em> mill to recover.",
"corpus": "paracrawl"
},
{
"id": "ty9_gYcBLC2PqRpsTaw8",
"source": "La botella debe ser pre-enjuagar con <em>agua</em> de la misma.",
"target": "The bottle should be pre-rinsed with <em>water</em> of the same.",
"corpus": "paracrawl"
},
{
"id": "Hn6GgYcBmWHIBpHryNQX",
"source": "Mojar con un buen vino tinto y muy poca <em>agua</em>.",
"target": "Moisten with a good red wine and very little <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "EnaGgYcBATG92Ilvi8m7",
"source": "Llena la piña con al menos una pulgada de <em>agua</em>.",
"target": "Fill the pineapple with at least an inch of <em>water</em>.",
"corpus": "paracrawl"
},
{
"id": "_DnPgIcBATG92Ilvy_Rv",
"source": "El <em>agua</em> es el mejor lugar para esconder un diamante.",
"target": "The <em>water</em> is the best place to hide a diamond.",
"corpus": "open-subtitles"
},
{
"id": "5IyugYcBmWHIBpHrtgrC",
"source": "Los ingredientes se mezclan con <em>agua</em> para formar una pasta.",
"target": "The ingredients are mixed with <em>water</em> to form a paste.",
"corpus": "paracrawl"
},
{
"id": "B3BAgocBLC2PqRpsvS8R",
"source": "Reacciona lentamente con el <em>agua</em> y es soluble en ácidos.",
"target": "It reacts slowly with <em>water</em> and is soluble in acids.",
"corpus": "paracrawl"
},
{
"id": "sZG_gYcBmWHIBpHr88zi",
"source": "Es este movimiento de <em>agua</em> el que crea la instalación.",
"target": "It is this movement of <em>water</em> that creates the installation.",
"corpus": "paracrawl"
},
{
"id": "Rrk2gocBmWHIBpHrMgc9",
"source": "Los aeropuertos de Yakarta y Singapur estarán bajo el <em>agua</em>.",
"target": "The airports of Jakarta and Singapore will be under <em>water</em>.",
"corpus": "paracrawl"
}
],
"translations": [
{
"translation": "water",
"count": 250432,
"isFacet": true
}
]
}
}
那如果https://www.spanishdict.com/examples/pan?lang=es&page=12
呢?页数的参数是什么?怎么看?
https://examples1.spanishdict.com/explore?lang=es&q=pan&numExplorationSentences=100
是不是要改成
https://examples1.spanishdict.com/explore?lang=es&q=pan&numExplorationSentences=100&&page=12
就能获取?但是我验证过了,没有变化
分页请求:
https://examples1.spanishdict.com/sentences?lang=es&q=pan&page=6&pageSize=20
{
"params": {
"q": "pan",
"lang": "es",
"page": 6,
"pageSize": 20
},
"data": {
"totalHits": 10000,
"sentences": [
{
"id": "3Du7gIcBmWHIBpHrfC4e",
"source": "Mi vida se trata de poner <em>pan</em> sobre la mesa.",
"target": "My life is about thisputting <em>bread</em> on the table.",
"corpus": "open-subtitles"
},
{
"id": "8KDrgYcBmWHIBpHrPjiX",
"source": "En café del pueblo donde usted puede conseguir <em>pan</em> y bebida.",
"target": "In village cafe where you can get <em>bread</em> and drink.",
"corpus": "paracrawl"
},
{
"id": "SjOKgYcBLC2PqRpsXFDR",
"source": "El <em>pan</em> y la copa son dos símbolos separados.",
"target": "The <em>bread</em> and the cup are two separate symbols.",
"corpus": "paracrawl"
},
{
"id": "YJXLgYcBmWHIBpHrtq2i",
"source": "Sonrisa gustosa como la harina y buena como el <em>pan</em>.",
"target": "Pleasant smile as the flour and good as the <em>bread</em>.",
"corpus": "paracrawl"
},
{
"id": "HyVigYcBLC2PqRpsIvHb",
"source": "Les dice que Él es el <em>pan</em> de vida.",
"target": "He says that He is the <em>bread</em> of life.",
"corpus": "paracrawl"
},
{
"id": "KHtjgocBLC2PqRpsFtMV",
"source": "Pronuncia sobre el <em>pan</em> la oración de alabanza y bendición.",
"target": "He recites over the <em>bread</em> the prayer of praise and blessing.",
"corpus": "paracrawl"
},
{
"id": "jMyGgocBATG92Ilv5YtZ",
"source": "Carolina y Renée durante la celebración del compartir del <em>pan</em>.",
"target": "Carolina and Renée during the celebration for the <em>bread</em> sharing.",
"corpus": "paracrawl"
},
{
"id": "I2cmgocBLC2PqRps7V2M",
"source": "Si el <em>pan</em> se seca, agregar el agua en pequeños incrementos.",
"target": "If the <em>pan</em> gets dry, add water in small increments.",
"corpus": "paracrawl"
},
{
"id": "i3p5gYcBmWHIBpHrVmCa",
"source": "Se comió con el <em>pan</em> como una comida completa.",
"target": "He ate with the <em>bread</em> as a full meal.",
"corpus": "paracrawl"
},
{
"id": "TKLZgocBLC2PqRpstvqT",
"source": "Buen <em>pan</em> se hace con levadura, preferiblemente un bio cereales orgánicos.",
"target": "Good <em>bread</em> is made with yeast, preferably a bio organic cereals.",
"corpus": "paracrawl"
},
{
"id": "KTibgYcBLC2PqRpsLLYn",
"source": "Por lo tanto, debemos tomar este mensaje como nuestro <em>pan</em> espiritual.",
"target": "Therefore, we must take these messages as our spiritual <em>bread</em>.",
"corpus": "paracrawl"
},
{
"id": "8WIagocBLC2PqRpsL_cW",
"source": "Es justo hoy que bendecimos, descanso y compartir el <em>pan</em>.",
"target": "It is fitting today that we bless, break and share <em>bread</em>.",
"corpus": "paracrawl"
},
{
"id": "HnhzgYcBmWHIBpHrjHmJ",
"source": "Nos sentamos en la misma mesa, com-partimos el mismo <em>pan</em>.",
"target": "We sat around the same table, shared the same <em>bread</em>.",
"corpus": "paracrawl"
},
{
"id": "NHVpgYcBmWHIBpHr7T_5",
"source": "Si lo desea, puede freír el <em>pan</em> en la tostadora.",
"target": "If you like, you can fry the <em>bread</em> in the toaster.",
"corpus": "paracrawl"
},
{
"id": "Kj-ugYcBLC2PqRpsxkZY",
"source": "Todos deben ganar su <em>pan</em> diario sobre la tierra.",
"target": "All must earn their daily <em>bread</em> upon the earth.",
"corpus": "paracrawl"
},
{
"id": "eZXigYcBATG92IlvYbb0",
"source": "Normalmente es de autoservicio e incluye <em>pan</em>, café y cereales.",
"target": "It is usually self-service and includes <em>bread</em>, coffee, cereal.",
"corpus": "paracrawl"
},
{
"id": "3XeJgYcBATG92Ilvrdmr",
"source": "El <em>pan</em> del cielo es su carne y su sangre.",
"target": "The <em>bread</em> from heaven is his flesh and blood.",
"corpus": "paracrawl"
},
{
"id": "XQIpg4cBATG92IlvBV36",
"source": "Y ese gorjeo que haces cuando quieres más <em>pan</em>.",
"target": "And that chirp you do when you want more <em>bread</em>.",
"corpus": "subtitulos"
},
{
"id": "DHN7gYcBATG92IlvyCYF",
"source": "La bebida y el <em>pan</em> también están incluidos en el precio.",
"target": "The drink and <em>bread</em> are also included in the price.",
"corpus": "paracrawl"
},
{
"id": "NbAygocBATG92IlvFTbN",
"source": "Es más galletas picadas y corte el <em>pan</em> en rebanadas.",
"target": "It is further chopped biscuits and <em>bread</em> cut into slices.",
"corpus": "paracrawl"
}
]
}
}
嗯,可以,那么如何抓取图片和音频呢?
提取出所有链接,保存到文本文件里,然后复制到任一支持批量下载的工具里就可以了,比如迅雷IDM之类的。链接很多的话,推荐 aria2 命令行批量下载,可配置项多,比自己写代码方便。
这三级索引怎么抓取wordlists?
把网址复制到浏览器可以正常访问,就没被墙,可能是被识别为爬虫。
我刚试了,网站正常访问,就是爬不了。很神奇……
网站:
可以尝试下 Playwright,这个更复杂一些,Github 上还有很多反反爬虫的插件可以帮助 Playwright 避免被识别为爬虫,或者跳过验证码。