Introduction
Python uses Selenium for web scraping. This case study is based on Google Chrome implementation.
Download Drivers
Download Dependencies
Import Dependencies
1
| from selenium import webdriver
|
Create Driver Object
- After creating the driver object, a new browser window automatically opens, regardless of whether an existing browser window is already open.
1
| driver = webdriver.Chrome()
|
Wait for the Browser to Fully Open
1 2 3 4
| import time
driver = webdriver.Chrome() time.sleep(5)
|
Automatic Destruction After Use
1 2
| with webdriver.Chrome() as driver: ...
|
Destroy the Driver Object
- Closing the browser window after destroying the driver object.
Access URL
<url>
: URL link to be accessed by the browser.
Find Elements
Import Dependencies
1
| from selenium.webdriver.common.by import By
|
ById
<id>
: ID of the HTML tag.
1
| res = driver.find_element(by=By.NAME, value="<id>")
|
Get Text Data
- Gets the
innerText
of the HTML tag excluding the child tags.
Get Attribute Value
- Gets the attribute value of the HTML tag.
<key>
: Attribute name.
1
| value = res.get_attribute("<key>")
|
Simulate Click
ByCSSSelector
.father .son
: Content selected by the CSS selector.
1
| res_list = driver.find_element(by=By.CSS_SELECTOR, value=".father .son")
|
Get Data
- The object needs to be iterated through the list in order to operate on it.
1 2
| for res in res_list: res.text
|
Execute JS
<javascript>
: JS code.
1
| driver.execute_script("<javascript>")
|
Completion
References
Selenium Official Documentation (Chinese)
Article on Simplified Chinese