'How to authenticate to Wikimedia Commons Query Service using OAuth in Python?
I am trying to use the Wikimedia Commons Query Service[1] programmatically using Python, but am having trouble authenticating via OAuth 1.
Below is a self contained Python example which does not work as expected. The expected behaviour is that a result set is returned, but instead a HTML response of the login page is returned. You can get the dependencies with pip install --user sparqlwrapper oauthlib certifi. The script should then be given the path to a text file containing the pasted output given after applying for an owner only token[2]. e.g.
Consumer token
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Consumer secret
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Access token
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Access secret
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
[1] https://wcqs-beta.wmflabs.org/ ; https://diff.wikimedia.org/2020/10/29/sparql-in-the-shadow-of-structured-data-on-commons/
[2] https://www.mediawiki.org/wiki/OAuth/Owner-only_consumers
import sys
from SPARQLWrapper import JSON, SPARQLWrapper
import certifi
from SPARQLWrapper import Wrapper
from functools import partial
from oauthlib.oauth1 import Client
ENDPOINT = "https://wcqs-beta.wmflabs.org/sparql"
QUERY = """
SELECT ?file WHERE {
?file wdt:P180 wd:Q42 .
}
"""
def monkeypatch_sparqlwrapper():
# Deal with old system certificates
if not hasattr(Wrapper.urlopener, "monkeypatched"):
Wrapper.urlopener = partial(Wrapper.urlopener, cafile=certifi.where())
setattr(Wrapper.urlopener, "monkeypatched", True)
def oauth_client(auth_file):
# Read credential from file
creds = []
for idx, line in enumerate(auth_file):
if idx % 2 == 0:
continue
creds.append(line.strip())
return Client(*creds)
class OAuth1SPARQLWrapper(SPARQLWrapper):
# OAuth sign SPARQL requests
def __init__(self, *args, **kwargs):
self.client = kwargs.pop("client")
super().__init__(*args, **kwargs)
def _createRequest(self):
request = super()._createRequest()
uri = request.get_full_url()
method = request.get_method()
body = request.data
headers = request.headers
new_uri, new_headers, new_body = self.client.sign(uri, method, body, headers)
request.full_url = new_uri
request.headers = new_headers
request.data = new_body
print("Sending request")
print("Url", request.full_url)
print("Headers", request.headers)
print("Data", request.data)
return request
monkeypatch_sparqlwrapper()
client = oauth_client(open(sys.argv[1]))
sparql = OAuth1SPARQLWrapper(ENDPOINT, client=client)
sparql.setQuery(QUERY)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
print("Results")
print(results)
I have also tried without SPARQLWrapper, but just using requests+requests_ouathlib. However, I get the same problem --- HTML for a login page is returned --- so it seems it might actually be a problem with Wikimedia Commons Query Service.
import sys
import requests
from requests_oauthlib import OAuth1
def oauth_client(auth_file):
creds = []
for idx, line in enumerate(auth_file):
if idx % 2 == 0:
continue
creds.append(line.strip())
return OAuth1(*creds)
ENDPOINT = "https://wcqs-beta.wmflabs.org/sparql"
QUERY = """
SELECT ?file WHERE {
?file wdt:P180 wd:Q42 .
}
"""
r = requests.get(
ENDPOINT,
params={"query": QUERY},
auth=oauth_client(open(sys.argv[1])),
headers={"Accept": "application/sparql-results+json"}
)
print(r.text)
Solution 1:[1]
Why don't you try and see if you can get a SPARQL query answered "by hand", using requests + OAuth etc. and then, if you can, you'll know that you've we've got a bug in SPARQLWrapper as opposed to an issue within your application code.
The requests code should look something like the following + OAuth stuff:
r = requests.get(
ENDPOINT,
params={"query": QUERY},
auth=auth,
headers={"Accept": "application/sparql-results+json"}
)
Nick
Solution 2:[2]
If you're asking for the MediaWiki OAuth v1 authentication
I interpret this as that you're looking for a way to do the OAuth against a WikiMedia site alone (using v1), the rest of your code isn't really part of the question? Correct me if I'm wrong.
You don't specify what kind of application you're developing, there are different ways to authenticate against Wikimedia pages using OAuth, for web applications using either Flask or Django with the correct back-end support.
A more "general" way is to use of the mwoauth library (python-mwoauth), from any application. It is still supported on both Python 3 and Python 2.
I assume the following:
- The target server has a MediaWiki installation with the OAuth Extension installed.
- You want to OAuth handshake with this server for authentication purposes.
Using Wikipedia.org as the example target platform:
$ pip install mwoauth
# Find a suitable place, depending on your app to include the authorization code:
from mwoauth import ConsumerToken, Handshaker
from six.moves import input # For compatibility between python 2 and 3
# Construct a "consumer" from the key/secret provided by the MediaWiki site
import config
consumer_token = ConsumerToken(config.consumer_key, config.consumer_secret)
# Construct handshaker with wiki URI and consumer
handshaker = Handshaker("https://en.wikipedia.org/w/index.php",
consumer_token)
# Step 1: Initialize -- ask MediaWiki for a temporary key/secret for user
redirect, request_token = handshaker.initiate()
# Step 2: Authorize -- send user to MediaWiki to confirm authorization
print("Point your browser to: %s" % redirect) #
response_qs = input("Response query string: ")
# Step 3: Complete -- obtain authorized key/secret for "resource owner"
access_token = handshaker.complete(request_token, response_qs)
print(str(access_token))
# Step 4: Identify -- (optional) get identifying information about the user
identity = handshaker.identify(access_token)
print("Identified as {username}.".format(**identity))
# Fill in the other stuff :)
I may have misinterpreted your question all together, if so, please shout to me through my left ear.
GitHub:
Here is a link to the docs, this includes an example using Flask: WikiMedia OAuth - Python
Solution 3:[3]
I would try running your code using a different endpoint. Instead of https://wcqs-beta.wmflabs.org/sparql try using https://query.wikidata.org/sparql. When I use the first endpoint I also get the HTML response of the login page that you were getting, however, when I use the second one I get the correct response:
from SPARQLWrapper import SPARQLWrapper, JSON
endpoint = "https://query.wikidata.org/sparql"
sparql = SPARQLWrapper(endpoint)
# Example query to return a list of movies that Christian Bale has acted in:
query = """
SELECT ?film ?filmLabel (MAX(?pubDate) as ?latest_pubdate) WHERE {
?film wdt:P31 wd:Q11424 .
?film wdt:P577 ?pubDate .
?film wdt:P161 wd:Q45772 .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?film ?filmLabel
ORDER BY DESC(?latest_pubdate)
LIMIT 50
"""
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
# Define a quick function to get json into pandas dataframe:
import pandas as pd
from pandas import json_normalize
def df_from_res(j):
df = json_normalize(j['results']['bindings'])[['filmLabel.value','latest_pubdate.value']]
df['latest_pubdate.value'] = pd.to_datetime(df['latest_pubdate.value']).dt.date
return df
df_from_res(results).head(5)
# filmLabel.value latest_pubdate.value
# 0 Ford v Ferrari 2019-11-15
# 1 Vice 2019-02-21
# 2 Hostiles 2018-05-31
# 3 The Promise 2017-08-17
# 4 Song to Song 2017-05-25
And this endpoint also works with the requests library in a similar way:
import requests
payload = {'query': query, 'format': 'json'}
results = requests.get(endpoint, params=payload).json()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nicholas Car |
| Solution 2 | |
| Solution 3 | user6386471 |
