Nitro's
  • Tech
  • Android
  • Linux
  • Life
  • Travel
  • Github
  • About

Python

A collection of 1 post
Python

Python 网页解析笔记(一)

语言版本:Python 2.7 函数库:urllib2、chardet、BeautifulSoup 示例代码如下: import urllib2 import chardet from bs4 import BeautifulSoup ​ data = urllib2.urlopen('http://www.nitrohsu.com').read() encodeStr = chardet.detect(a)['encoding'] soup=BeautifulSoup(data,from_encoding=encodeStr) ​ print soup.prettify -------------------------------------------------------------------------------- chardet是一个自动检测网页编码的函数,调用detect会返回一个字典: {'confidence': 0.99, 'encoding': 'utf-8'} confidence是检测的正确率,encoding是网页编码的代码 ---
Mar 12, 2013 — 1 min read
Nitro's © 2022
京ICP备16014945号-1