'Methods to manipulate entries in a list
I've hit my next road block. I've retrieved the URLs for the images I'd like to download. The problem is they have parameters to shrink the images to thumbnail size:
['https://www.lego.com/cdn/cs/set/assets/blt92a894b291b4c966/21054.jpg?fit=bounds&format=jpg&quality=80&width=65&height=45&dpr=1',
...
'https://www.lego.com/cdn/cs/set/assets/bltea2ebe53c7c18194/21054_alt14.jpg?fit=bounds&format=jpg&quality=80&width=65&height=45&dpr=1']
I'd like to strip the "[" at the beginning, the "]: at the end, and everything after the "?" in each link.
I tried to use strip, but that didn't work because it's a list.
I then read somewhere to use pandas and that's making my head spin. Specifically, how is the value of each row in the column passed to a variable? Also any, pointers regarding how to strip the aforementioned characters would be great. I'm still tinkering with it.
Complete Code for reference:
import io
from os import link
from re import search
from typing import Counter
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import time
import bs4
import os
import wget
import requests
from PIL import Image
import pandas as pd
set_number = "21054"
#specify the path to chromedriver.exe (download and save on your computer)
driver = webdriver.Chrome('/Users/ibrahiemk/Downloads/chromedriver')
#open the webpage
driver.get("http://shop.lego.com")
#alert 1
button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="root"]/div[5]/div/div/div[1]/div[1]/div/button'))).click()
#Button
button2 = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//button[normalize-space()='Just Necessary']"))).click()
#target Search
search = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[@id='root']/div[2]/header/div[2]/div[2]/div/div[5]/div/button"))).click()
searchbox = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH, "//*[@id='desktop-search-search-input']")))
searchbox.send_keys(set_number)
#Click the resulting set
searchbox = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH, "//*[@id='desktop-search-search-suggestions']/li/a/div"))).click()
anchors = driver.find_elements(By.XPATH, '//*[@id="main-content"]/div/div[1]/div/div[1]/div[1]/div/div/div/div[2]/div/div/div/ol/li/button/img')
links = [a.get_attribute('src') for a in anchors]
df = pd.DataFrame(links, columns=['links'])
df ['links'] = df['links'].str.rstrip('?.*$')
links = links.values[0]
print(df)
I know the df stuff is broken, still tinkering with it. TIA!
Solution 1:[1]
You can do something like this:
urls = ["https://example.com/bar1.jpg?query-string",
"https://example.com/bar2.jpg?query-string",
"https://example.com/bar3.jpg?query-string"]
stripedUrls = []
for url in urls:
stripedUrl = url.split("?")[0]
stripedUrls.append(stripedUrl)
Or maybe the following one-liner if you prefer:
stripedUrls = [url.split("?")[0] for url in urls]
For more query-string striping ways, see How do I remove a query string from URL using Python.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andreas |
