【笔记】lxml学习笔记
前言
lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. It’s also very fast and memory friendly, just so you know.(Github)
下载依赖
1 | pip3 install lxml |
引入依赖
1 | from lxml import etree |
通过HTML字符串创建Element对象
- 在解析HTML字符串的同时,会自动补齐缺失的HTML标签
<html>:HTML字符串
1 | html = etree.HTML("<html>") |
利用XPath语法提取数据
- 返回列表
<xpath>:XPath语法
1 | result = html.xpath("<xpath>") |
将Element对象转换为字符串
1 | result = etree.tostring(html) |