Microsoft Business Intelligence

Python Script to get Data for Power BI

By 29 January 2020June 23rd, 2020No Comments

For a recent meetup I wanted to query the website predithq. They have a dataset of events and I wanted to use this data in a meetup.

Unfortunately Power BI isn’t compatible with the API this provider uses. They use a 308 Permanent redirect as the initial response, requiring an additional request. After a bit of digging Power Query only handles redirects up to 307.

I have a suspicion this is .net problem, hence, may not be within control of the Power BI team to sort out. I used this blockage to test out using a Python script as a data source for Power BI. The Python web client has no problem handling 308 redirects, thankfully.

I’m running Python under a Conda environment so I had to set the options in Power BI to look at this environment path.

Python Code

I then created Python code and tested it with VS Code. Note, to get the script working with Power BI you need to install packages for pandas and matplotlib.

Power Query is not very good at processing JSON,hence, I cleaned the data in Python and constructed the Data Frame with individual lists so I could take care of missing elements.

import requests
import matplotlib
import pandas as pd


response = requests.get(
    url="https://api.predicthq.com/v1/events",
    headers={
        "Authorization": "Bearer $accesscode$",
        "Accept": "application/json"

    }, params={"limit": "50"}
)


data = response.json()


res = data['results']
category = list(map(lambda x: x['category'], res))
country = list(map(lambda x: x['country'], res))
description = list(map(lambda x: x['description'], res))
duration = list(map(lambda x: x['duration'], res))
end = list(map(lambda x: x['end'], res))
entities = list(map(lambda x: x['entities'], res))
firstseen = list(map(lambda x: x['first_seen'], res))
iid = list(map(lambda x: x['id'], res))
labels = list(map(lambda x: x['labels'], res))
location = list(map(lambda x: x['location'], res))
entity_id = list(map(lambda x: x[0]['entity_id'] if x else '', entities))
formatted_address = list(map(lambda x: x[0]['formatted_address'] if x else '', entities))
venue_name = list(map(lambda x: x[0]['name'] if x else '', entities))
venue_type = list(map(lambda x: x[0]['type'] if x else '', entities))


df = pd.DataFrame(list(zip(
    category, country, formatted_address, venue_name, venue_type,  description, duration, end, firstseen, labels,)),
    columns=['category', 'country', 'formatted_address', 'venue_name', 'venue_type', 'description', 'duration', 'end', 'firstseen', 'labels'])

print(df)

Get Data

Once your happy with the data, open Power BI and Get Data from Python Script.

Paste in the Python Script.

Power BI will then execute the script returning data like any other data, giving you the option to Load or Transform.

I did a little data cleaning in Power Query as there is some data held in nested lists.

Wrapping it Up

In summary, you have a quite a bit of Power to extend the Get Data source without having to learn about the Custom Connector route. I hope Microsoft handle 308 responses soon as I only had to write this code due to the non-conformance of the RFC, which by the way was issued in 2015!

If any MVP’s pick this bug up I would appreciate you feeding back to Microsoft as I expect more and more providers to use 308 redirects for future proofing.