An occasion came up where I needed to collect weather data.
The kinds and formats of weather data used in industry are surprisingly diverse. Among them, I learned of a format called epw. It is fairly fussy, but there is a place that provides weather data for 3,000 cities in 98 countries free of charge in this format. However, that site was built as a dynamic page.
So I had a chance to pull Selenium back out from where it had been gently? sleeping in the back of my memory.
In the process, I ended up using XPath (XML Path Language), which I hadn't really known before (I had just been blindly following along, or sticking only to the methods I already knew).
Rather than selecting in DOM script by id, class, or tag, it selects based on relative/absolute coordinate (x, y axis) positions, as if — the keyword "Path" really did fit perfectly.
Thanks to that, on a dynamic page; in a list whose count and order change every time; even when extracting only a specific item (an a-tag element whose name in the list is epw), I could specify the desired element without having to run a loop.
And Selenium has been updated very, very much since before. If the Chrome browser you usually use is 11.5 or above, you don't even need to install a separate program.
Main points:
1. Dependency (package) management
`pip install selenium`
`pip install webdriver-manager`
2. Basic setup (import)
import os
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
# Set up driver and load the webpage
chrome_path = "C:/Users/userName/chromedriver.exe"
driver = webdriver.Chrome()
page_url = "스크래핑하려는 타깃 페이지 url"
driver.get(page_url)
3. On a dynamic page; in a list whose count and order change every time; selecting only a specific item (an a-tag element whose name in the list is epw)
1) Although the language's sentence structure uses the XPath way (syntax), the way (method) it is used follows a context similar to DOM script. Example:
download_button = driver.find_element(By.XPATH, '//a[@class="내가 원하는; a Tag;에 선언되어 있는 class이름"]')2) Example with a sentence structure that's more XPath-like:
download_button = driver.find_element(By.PARTIAL_LINK_TEXT, 'epw')
Reference materials: It seems most things are covered in the links below.
What is XPATH? Easily selecting elements with Selenium XPath!
How to use Python Selenium — how to select a specific element?!
01. selenium 4
selenium: https://www.selenium.dev/documentation/webdriver/
WebDriver
WebDriver drives a browser natively, learn more about it.
www.selenium.dev
