【笔记】lxml学习笔记

前言

lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. It’s also very fast and memory friendly, just so you know.(Github

下载依赖

1
pip3 install lxml

引入依赖

1
from lxml import etree

通过HTML字符串创建Element对象

  • 在解析HTML字符串的同时,会自动补齐缺失的HTML标签

<html>:HTML字符串

1
html = etree.HTML("<html>")

利用XPath语法提取数据

  • 返回列表

<xpath>:XPath语法

1
result = html.xpath("<xpath>")

将Element对象转换为字符串

1
result = etree.tostring(html)

完成

参考文献