{"id":20,"date":"2024-07-20T16:22:33","date_gmt":"2024-07-20T08:22:33","guid":{"rendered":"https:\/\/www.ironbar.cn\/?p=20"},"modified":"2024-09-21T14:09:20","modified_gmt":"2024-09-21T06:09:20","slug":"%e7%94%a8python%e8%8e%b7%e5%8f%96www-ironbar-cn%e7%9a%84%e5%9b%be%e7%89%87","status":"publish","type":"post","link":"https:\/\/www.ironbar.cn\/index.php\/2024\/07\/20\/%e7%94%a8python%e8%8e%b7%e5%8f%96www-ironbar-cn%e7%9a%84%e5%9b%be%e7%89%87\/","title":{"rendered":"\u7528Python\u83b7\u53d6www.ironbar.cn\u7684\u56fe\u7247"},"content":{"rendered":"\n<p>\u8981\u5728 www.ironbar.cn \u7f51\u7ad9\u4e0a\u4f7f\u7528Python\u7684requests\u6a21\u5757\u83b7\u53d6\u8be5\u7f51\u7ad9\u5185\u7684\u6240\u6709\u56fe\u7247\uff0c\u6211\u4eec\u9700\u8981\u4ee5\u4e0b\u6b65\u9aa4\uff1a<br><br>1. \u53d1\u9001HTTP\u8bf7\u6c42\u5230\u8be5\u7f51\u7ad9\uff0c\u83b7\u53d6\u7f51\u9875\u5185\u5bb9\u3002<br>2. \u89e3\u6790\u7f51\u9875\u5185\u5bb9\uff0c\u63d0\u53d6\u6240\u6709\u56fe\u7247\u7684URL\u3002<br>3. \u4e0b\u8f7d\u6216\u5904\u7406\u8fd9\u4e9b\u56fe\u7247\u3002<br><br>\u4ee5\u4e0b\u662f\u5177\u4f53\u7684Python\u4ee3\u7801\u793a\u4f8b\uff1a<br><br><code>```python<br>import requests<br>from bs4 import BeautifulSoup<br>import os<br><br># \u8bbe\u7f6e\u8bf7\u6c42\u5934<br>headers = {<br>    'User-Agent': 'Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/91.0.4472.124 Safari\/537.36',<br>    'Accept-Language': 'en-US,en;q=0.5',<br>    'Accept-Encoding': 'gzip, deflate',<br>    'Connection': 'keep-alive',<br>    'Accept': 'text\/html,application\/xhtml+xml,application\/xml;q=0.9,image\/webp,*\/*;q=0.8'<br>}<br><br># \u83b7\u53d6\u7f51\u9875\u5185\u5bb9<br>url = 'http:\/\/www.ironbar.cn'<br>response = requests.get(url, headers=headers)<br>response.raise_for_status()<br><br># \u89e3\u6790\u7f51\u9875\u5185\u5bb9<br>soup = BeautifulSoup(response.text, 'html.parser')<br><br># \u63d0\u53d6\u6240\u6709\u56fe\u7247\u7684URL<br>image_urls = []<br>for img_tag in soup.find_all('img'):<br>    img_url = img_tag.get('src')<br>    if img_url:<br>        # \u5904\u7406\u76f8\u5bf9\u8def\u5f84<br>        if not img_url.startswith('http'):<br>            img_url = requests.compat.urljoin(url, img_url)<br>        image_urls.append(img_url)<br><br># \u8f93\u51fa\u6240\u6709\u56fe\u7247URL<br>for img_url in image_urls:<br>    print(img_url)<br><br># \u53ef\u9009\uff1a\u4e0b\u8f7d\u6240\u6709\u56fe\u7247\u5230\u672c\u5730\u76ee\u5f55<br>output_dir = 'images'<br>os.makedirs(output_dir, exist_ok=True)<br><br>for img_url in image_urls:<br>    img_response = requests.get(img_url, headers=headers)<br>    img_name = os.path.join(output_dir, os.path.basename(img_url))<br>    with open(img_name, 'wb') as img_file:<br>        img_file.write(img_response.content)<br>    print(f'Downloaded {img_name}')<br>```<\/code><br><br>### \u4ee3\u7801\u8bf4\u660e\uff1a<br><br>1. **\u8bf7\u6c42\u5934\u7684\u8bbe\u7f6e**\uff1a`headers`\u662f\u7528\u6765\u6a21\u62df\u6d4f\u89c8\u5668\u53d1\u9001\u8bf7\u6c42\uff0c\u4ee5\u907f\u514d\u88ab\u670d\u52a1\u5668\u62d2\u7edd\u3002<br>2. **\u83b7\u53d6\u7f51\u9875\u5185\u5bb9**\uff1a\u4f7f\u7528`requests.get`\u65b9\u6cd5\u83b7\u53d6\u7f51\u9875\u5185\u5bb9\uff0c\u5e76\u7528BeautifulSoup\u89e3\u6790HTML\u5185\u5bb9\u3002<br>3. **\u63d0\u53d6\u56fe\u7247URL**\uff1a\u901a\u8fc7BeautifulSoup\u627e\u5230\u6240\u6709`&lt;img>`\u6807\u7b7e\uff0c\u5e76\u63d0\u53d6`src`\u5c5e\u6027\u3002\u5982\u679c\u56fe\u7247\u7684URL\u662f\u76f8\u5bf9\u8def\u5f84\uff0c\u901a\u8fc7`requests.compat.urljoin`\u65b9\u6cd5\u5c06\u5176\u8f6c\u6362\u4e3a\u7edd\u5bf9\u8def\u5f84\u3002<br>4. **\u8f93\u51fa\u56fe\u7247URL**\uff1a\u5c06\u6240\u6709\u56fe\u7247\u7684URL\u6253\u5370\u51fa\u6765\u3002<br>5. **\u4e0b\u8f7d\u56fe\u7247\uff08\u53ef\u9009\uff09**\uff1a\u5c06\u6240\u6709\u56fe\u7247\u4e0b\u8f7d\u5230\u672c\u5730\u76ee\u5f55`images`\u4e2d\u3002<br><br>### \u8fd0\u884c\u73af\u5883\uff1a<br>&#8211; Python 3.x<br>&#8211; \u9700\u8981\u5b89\u88c5`requests`\u548c`beautifulsoup4`\u5e93\uff0c\u53ef\u4ee5\u4f7f\u7528\u4ee5\u4e0b\u547d\u4ee4\u5b89\u88c5\uff1a<br><code>```bash<br>pip install requests beautifulsoup4<br>```<\/code><br><br>\u8fd0\u884c\u4ee5\u4e0a\u4ee3\u7801\uff0c\u4f60\u5c06\u80fd\u591f\u83b7\u53d6\u5e76\u4e0b\u8f7dwww.ironbar.cn\u7f51\u7ad9\u4e0a\u7684\u6240\u6709\u56fe\u7247\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u8981\u5728 www.ironbar.cn \u7f51\u7ad9\u4e0a\u4f7f\u7528Python\u7684requests\u6a21\u5757\u83b7\u53d6\u8be5\u7f51\u7ad9\u5185\u7684\u6240\u6709\u56fe\u7247\uff0c\u6211\u4eec\u9700 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[2],"class_list":["post-20","post","type-post","status-publish","format-standard","hentry","category-item","tag-python"],"_links":{"self":[{"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/posts\/20","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/comments?post=20"}],"version-history":[{"count":1,"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/posts\/20\/revisions"}],"predecessor-version":[{"id":21,"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/posts\/20\/revisions\/21"}],"wp:attachment":[{"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/media?parent=20"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/categories?post=20"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ironbar.cn\/index.php\/wp-json\/wp\/v2\/tags?post=20"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}