[Selenium] 程式設計教學：如何到玉山銀行網站抓取歷史匯率資料

免責聲明：我們盡力確保本文的正確性，但本文不代表任何投資的建議，我們也無法擔保因使用本文的內容所造成的任何損失。如對本文內容有疑問，請詢問財經相關的專家。

在本文中，我們會使用 Selenium 到玉山銀行的網站抓取歷史匯率資料。雖然玉山銀行提供匯率數據的下載連結，但我們這次不直接從該連結下載數據，而會直接爬取頁面上的資料。透過這隻程式，我們可以練習在跨越多頁的表格中抓取資料。

我們這個程式的動作如下：

前往玉山銀行的歷史匯率頁面
選擇幣別，像是 USD
選擇即期 (spot)，讀者可視需求改為現金 (cash)
選擇時距，此處選擇一年
選擇查詢方式，此處選數據表
(需用爬蟲) 從頁面抓取歷史匯率資料並輸出至 CSV 檔

請讀者自行手動操作一下，感受一下這個過程。由於程式碼略長，我們將完整的程式碼放在這裡，本文會拆解範例程式碼。

一開始先引入相關套件：

import csv
import os
import sys
import time

from selenium import webdriver

設置可用的參數：

currency = "USD"

在此處為了簡化範例，我們把參數寫死。實際上在製作爬蟲時，可將這部分參數化，就可以將程式做成小工具。

建立 web driver 物件：

# Create a new instance of the Chrome driver
driver = webdriver.Chrome()

前往玉山銀行的歷史匯率頁面：

# Go to ESun Bank Historical Data page
driver.get("https://www.esunbank.com.tw/bank/personal/deposit/rate/forex/exchange-rate-chart")

# Wait the page to fresh.
time.sleep(10)

開始選擇幣別的選單並選取目標幣別：

# Select the currency menu.
currencyArrow = driver.find_element_by_css_selector(".transformSelect li span")
currencyArrow.click()

# Select the currency.
currencyOptions = driver.find_elements_by_css_selector(".transformSelectDropdown li span")
for c in currencyOptions:
    if currency in c.text:
        c.click()
        break

# Wait the page to fresh.
time.sleep(3)

選擇即期 (spot)：

spotBtn = driver.find_element_by_css_selector(".radioBtns [for=\"spot\"]")
spotBtn.click()

# Wait the page to fresh.
time.sleep(3)

選擇時距，本範例選擇一年：

# Select the duration.
durationBtn = driver.find_element_by_css_selector("div [for=\"oneYear\"]")
durationBtn.click()

# Wait the page to fresh.
time.sleep(3)

選擇數據表：

# Select the data type.
dataBtn = driver.find_element_by_css_selector(".radioBtns [for=\"data\"]")
dataBtn.click()

# Wait the page to fresh.
time.sleep(3)

爬取這個表格以抓取資料，這個動作是本範例的關鍵：

sys.stderr.write("Write data to csv file...\n")

with open("%sto%s.csv" % (currency, "TWD"), 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile)

    csvwriter.writerow(["Date", "SellingRate", "BuyingRate"])

    hasMorePages = True

    while hasMorePages:
        items = driver.find_elements_by_css_selector("#inteTable tbody tr")

        for item in items:
            tds = item.find_elements_by_css_selector("td")

            if tds[0].get_attribute("class") != "itemTtitle":
                continue

            csvwriter.writerow([tds[0].text, tds[1].text, tds[2].text])

        nextBtn = driver.find_element_by_css_selector(".pageNumberBlock .down")
        if "active" in nextBtn.get_attribute("class"):
            nextBtn.click()
        else:
            hasMorePages = False

        time.sleep(1)

time.sleep(4)

這類表格的頁數往往不是固定的，要怎麼確認結束的時機點呢？這個沒有固定的方法，需見招拆招。以本例來說，當爬蟲爬行到表格的最後一頁時，nextBtn 的 class 會有變化，就可以做為時機點。

最後別忘了關掉瀏覽器：

# Close the browser.
driver.quit()

關於作者

身為資訊領域碩士，位元詩人 (ByteBard) 認為開發應用程式的目的是為社會帶來價值。如果在這個過程中該軟體能成為永續經營的項目，那就是開發者和使用者雙贏的局面。

位元詩人喜歡用開源技術來解決各式各樣的問題，但必要時對專有技術也不排斥。閒暇之餘，位元詩人將所學寫成文章，放在這個網站上和大家分享。