'Get list of all files from Bitbucket with API 2.0 and Python

I used to manage commits of files of my Bitbucket on-premise instance with que API 1.0 and Python. Retrieving list of all files was quite easy with the '/files' instruction in Python :

url_0='https://bitbucket.mycompany.intra/rest/api/1.0/projects/'
url_1='/repos/'
url_2='/files?limit=100000'

#Dataframe to store list of all files in my bitbucket
df_files=pd.DataFrame(columns=['values', 'size', 'isLastPage', 'start', 'limit', 'nextPageStart',
           'project.key', 'repos.name'])

i=0
for i in tqdm.tqdm(range(len(df_repos)),position=0):
    url_proj=df_repos["project.key"][i]
    url_repos=df_repos["name"][i]
    url=url_0+url_proj+url_1+url_repos+url_2
    response = requests.get(url,verify=False, auth=(login_id,login_pwd))
    r=response.json()
    df_files_it=pd.DataFrame.from_dict(r)
    df_files_it['project.key']=url_proj
    df_files_it['repos.name']=url_repos
    df_files=pd.concat([df_files, df_files_it])

df_files=df_files.reset_index(level=0, drop=True)

I am migrating my on-premise Bitbucket to the cloud version and there is only que API 2.0 which is available. Then, I have to find a way to get the list of all files in my repos. I was able to get the list of all repos :

df_repos=pd.DataFrame(columns=['uuid','slug', 'full_name','created_on' 'updated_on', 'is_private'])

# Request 100 repositories per page (and only their slugs), and the next page URL
next_page_url = 'https://api.bitbucket.org/2.0/repositories/mycompany?pagelen=100&fields=next,values.uuid,values.updated_on,values.html,values.full_name,values.created_on,values.slug,values.is_private'

# Keep fetching pages while there's a page to fetch
while next_page_url is not None:
  response = requests.get(next_page_url, auth=HTTPBasicAuth(login_id, login_pwd))
  page_json = response.json()

  # Parse repositories from the JSON
  for repo in page_json['values']:
    df_repos_it=pd.DataFrame(repo,index=[0])
    df_repos_it=df_repos_it[['uuid','slug', 'full_name','created_on','updated_on', 'is_private']]
    df_repos=df_repos.append(df_repos_it)

  # Get the next page URL, if present
  # It will include same query parameters, so no need to append them again
  next_page_url = page_json.get('next', None)

But I'm not able to get the list of all files of the repos in df_repos Do I need to do something recursively to get every elements from :

page_url = 'https://api.bitbucket.org/2.0/repositories/mycompany/repos_name/src'
response = requests.get(next_page_url, auth=HTTPBasicAuth(login_id, login_pwd))
page_json = response.json()

Thanks for your help!



Solution 1:[1]

Hello Here is the way i used: 1-create consumer 2-get the token

headers = CaseInsensitiveDict()
headers["Content-Type"] = "application/x-www-form-urlencoded"
url = "https://bitbucket.org/site/oauth2/access_token"

headers["Authorization"] = "Basic kjbkl...."

data = "grant_type=client_credentials"
resp = requests.post(url, headers=headers, data=data)
token = resp.json().get("access_token") 
headers["Authorization"] = "Bearer "+token
headers["Accept"] = "application/json"

Make sure using a branch url

url= "https://api.bitbucket.org/2.0/repositories/workspace/repos_name/src/branch_name"
        
def get_files(url):
        li=[]
        response_p = requests.request(
                 "GET",
                  url+"/?fields=values.path",
                  headers=headers
                  )
        json_p=response_p.json()
    
        for el in (json_p["values"]):
          path=el["path"]
          s_url=url+"/"+path.split("/")[-1]
          if '.' in path :
            li.append(path)
          else:
            li.extend(get_files(s_url))
        return li
                
li=get_files(url)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Anis BEN MOUSSA