Automatic test: Selenium automatically logs in and authorizes, and then Requests the content

Selenium automatically logs in to the website, screenshots and Requests to capture the content of the webpage after login. Let's get to know each other.

  • Selenium: a comprehensive project of a series of tools and libraries supporting Web browser automation.
  • Requests: the only non transgenic Python HTTP library, which can be safely enjoyed by human beings.

Why choose Selenium for automatic login?

Selenium implementation is equivalent to simulating the process of users manually opening the browser and logging in.

Compared with direct HTTP request login, there are several advantages:

  1. Avoid the complexity of the login window (iframe, ajax, etc.) and save analysis details.
    • It can be implemented with Selenium according to the user's operation process.
  2. Avoid simulating Headers, recording Cookies and other HTTP login details.
    • It can be realized with Selenium and depends on the browser's own functions.
  3. It is conducive to load waiting, find special cases (login verification, etc.) and add further logic.

In addition, the visualization of automatic login and other processes makes the layman feel high-end.

Why do you choose Requests to grab web content?

Grab some content after login, rather than crawling the website. Requests are enough and easy to use.

1) Prepare Selenium

Basic environment: Python 3.7.4 (anaconda3-2019.10)

pip installation Selenium:

pip install selenium

Get Selenium version information:

$ python
Python 3.7.4 (default, Aug 13 2019, 15:17:50)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import selenium
>>> print('Selenium version is {}'.format(selenium.__version__))
Selenium version is 3.141.0

2) Prepare browser and its driver

Download Google Chrome and install:
https://www.google.com/chrome/

Download Chromium/Chrome WebDriver:
https://chromedriver.storage.googleapis.com/index.html

Then, add the WebDriver PATH to the PATH, for example:

# macOS, Linux
export PATH=$PATH:/opt/WebDriver/bin >> ~/.profile

# Windows
setx /m path "%path%;C:\WebDriver\bin\"

3) Go coding!

Read login configuration

The login information is private. We read it from the json configuration:

# load config
import json
from types import SimpleNamespace as Namespace

secret_file = 'secrets/douban.json'
# {
#   "url": {
#     "login": "https://www.douban.com/",
#     "target": "https://www.douban.com/mine/"
#   },
#   "account": {
#     "username": "username",
#     "password": "password"
#   }
# }
with open(secret_file, 'r', encoding='utf-8') as f:
  config = json.load(f, object_hook=lambda d: Namespace(**d))

login_url = config.url.login
target_url = config.url.target
username = config.account.username
password = config.account.password

Selenium Auto Login

It is implemented by Chrome WebDriver, and the login test site is Douban.

Open the login page, and automatically enter the user name and password to log in:

# automated testing
from selenium import webdriver

# Chrome Start
opt = webdriver.ChromeOptions()
driver = webdriver.Chrome(options=opt)
# Chrome opens with "Data;" with selenium
#   https://stackoverflow.com/questions/37159684/chrome-opens-with-data-with-selenium
# Chrome End

# driver.implicitly_wait(5)

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 5)

print('open login page ...')
driver.get(login_url)
driver.switch_to.frame(driver.find_elements_by_tag_name("iframe")[0])

driver.find_element_by_css_selector('li.account-tab-account').click()
driver.find_element_by_name('username').send_keys(username)
driver.find_element_by_name('password').send_keys(password)
driver.find_element_by_css_selector('.account-form .btn').click()
try:
  wait.until(EC.presence_of_element_located((By.ID, "content")))
except TimeoutException:
  driver.quit()
  sys.exit('open login page timeout')

If Internet Explorer is used, it is as follows:

# Ie Start
# Selenium Click is not working with IE11 in Windows 10
#   https://github.com/SeleniumHQ/selenium/issues/4292
opt = webdriver.IeOptions()
opt.ensure_clean_session = True
opt.ignore_protected_mode_settings = True
opt.ignore_zoom_level = True
opt.initial_browser_url = login_url
opt.native_events = False
opt.persistent_hover = True
opt.require_window_focus = True
driver = webdriver.Ie(options = opt)
# Ie End

If more functions are set, you can:

cap = opt.to_capabilities()
cap['acceptInsecureCerts'] = True
cap['javascriptEnabled'] = True

Open the target page and take a screenshot

print('open target page ...')
driver.get(target_url)
try:
  wait.until(EC.presence_of_element_located((By.ID, "board")))
except TimeoutException:
  driver.quit()
  sys.exit('open target page timeout')

# save screenshot
driver.save_screenshot('target.png')
print('saved to target.png')

Requests copying Cookies, request HTML

# save html
import requests

requests_session = requests.Session()
selenium_user_agent = driver.execute_script("return navigator.userAgent;")
requests_session.headers.update({"user-agent": selenium_user_agent})
for cookie in driver.get_cookies():
  requests_session.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain'])

# driver.delete_all_cookies()
driver.quit()

resp = requests_session.get(target_url)
resp.encoding = resp.apparent_encoding
# resp.encoding = 'utf-8'
print('status_code = {0}'.format(resp.status_code))
with open('target.html', 'w+') as fout:
  fout.write(resp.text)

print('saved to target.html')

4) Run tests

You can temporarily add the WebDriver PATH to the PATH:

# macOS, Linux
export PATH=$(pwd)/drivers:$PATH

# Windows
set PATH=%cd%\drivers;%PATH%

Run the Python script, and the output information is as follows:

$ python douban.py
Selenium version is 3.141.0
--------------------------------------------------------------------------------
open login page ...
open target page ...
saved to target.png
status_code = 200
saved to target.html

Screenshot target.png , HTML content target.html , the results are as follows:

epilogue

What if the login process encounters authentication?

  1. Sliding verification, which can be simulated by Selenium
    • Sliding distance, image gradient algorithm can judge
  2. Graphic verification, which can be recognized by Python AI Library

reference resources

This article code Gist address:
https://gist.github.com/ikuokuo/1160862c154d550900fb80110828c94c

Tags: Selenium Python JSON github

Posted on Sun, 31 May 2020 18:01:20 -0700 by plapeyre