Download API Management paginated outcome to CSV python

0

Hello Reader

Azure API Management is a very powerful tool to control how data and services are exposed to employees, partners, and customers by applying authentication, authorization, and usage limits.

Downloading all API responses from Azure API management is a challenge because it comes with pagination and always there is rate limit that restricts us from calling recursively.

You can rate limit the calls by with following policy :

<policies>
    <inbound>
        <base />
        <rate-limit calls="30" renewal-period="90" remaining-calls-variable-name="remainingCallsPerSubscription"/>
    </inbound>
    <outbound>
        <base />
    </outbound>
</policies>

This sets 30 calls per 90 seconds.

So pulling all the data out with pagination and is a challenge, to overcome we will put up the following python script.

Pre-requisite :

  • Python 3+ with requests and pandas module
#Written By Tushar Kumar
#followtushar@outlook.com
#Theazway.com 

import requests
import json
import time

def getdata():
    header = {"Ocp-Apim-Subscription-Key": ""}
    file_name= 'Output.json'
    url_base = ""  

    s = requests.Session()
    nexlink = 1
    records = 0
    jsondata= []
    while nexlink > 0 :
        # Create String with Skip value
        print(records)
        url_get = url_base+"&$skip="+str(records)
        print(url_get)
        #Call the API to Get the value 
        resp = s.get(url_get, headers=header, verify=False)
        data = resp.json()
        if 'statusCode' in data:
            if data['statusCode'] == 429:
                    print('API Threshold reached starting again in 60s')
                    time.sleep(60)
                    continue
        if '@odata.nextLink' not in data:
            if 'value' in data:
                jsondata +=data["value"]
            nexlink=0
            print('*********************************End of Script************************************************')
            print('No next Page found!')
        else : 
            print('Collecting Data from Next pages :')
            print('-----------------------------------------------------------------------------------------')
            #print(data['@odata.nextLink'])
            if records==0: 
                    jsondata = data["value"]
            else:
                    jsondata += data["value"]
            print('-----------------------------------------------------------------------------------------')
        #Increase the value for next run
        records += 50
    with open(file_name, 'w+', encoding='utf-8') as f:
        json.dump(jsondata, f, ensure_ascii=False, indent=4)
def convert():
    import pandas as pd
    df = pd.read_json (r'outputfileBusE.json')
    df.to_csv (r'outputfileBusE.csv', index = None)


getdata()
convert()

This will pull all the data down and put it in a CSV. Please change the time sleep based on the rate limit you have set as in my case it’s 90.

So in every run, it will call 20 pages to sleep for 90 sec. and restart.

Thanks for your precious time.

Leave a Reply

Your email address will not be published. Required fields are marked *