'How can extract an income statement from all company concepts?

All company concepts in xbrl format can be extracted with sec's RESTful api. For example,i want to get tesla's concepts in xbrl format in 2020, get the tesla's cik and the url for api.

cik='1318605'
url = 'https://data.sec.gov/api/xbrl/companyfacts/CIK{:>010s}.json'.format(cik)

To express financial statement in 2020 with elements fy and fp:

'fy' ==  2020 and 'fp' == 'FY'

I write the whole python code to call sec's api:

import requests
import json
cik='1318605'
url = 'https://data.sec.gov/api/xbrl/companyfacts/CIK{:>010s}.json'.format(cik)
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36"
          }
res = requests.get(url=url,headers=headers)
result = json.loads(res.text)
concepts_list = list(result['facts']['us-gaap'].keys())
data_concepts = result['facts']['us-gaap']

fdata = {}
for item in concepts_list:
    data = data_concepts[item]['units']
    data_units = list(data_concepts[item]['units'].keys())
    for data_units_attr in data_units:
        for record in data[data_units_attr]:
            if record['fy'] ==  2020 and record['fp'] == 'FY':
                fdata[item] = record['val']

fdata contains all the company concepts and its value in 2020 for tesla,show part of it:

fdata
{'AccountsAndNotesReceivableNet': 334000000, 
 'AccountsPayableCurrent': 6051000000,    
 'AccountsReceivableNetCurrent': 1886000000,

How can get all concepts below to income statement?I want to extract it to make an income statement.

Maybe i should add some values in dei with almost same way as above.

EntityCommonStockSharesOutstanding:959853504
EntityPublicFloat:160570000000

It is simple to parse financial statement such as income statement from ixbrl file:

https://www.sec.gov/ix?doc=/Archives/edgar/data/0001318605/000156459021004599/tsla-10k_20201231.htm

I can get it ,please help to replicate the annual income statement on 2020 for Tesla from sec's RESTful api,or extract the income statement from the whole instance file:

https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/0001564590-21-004599.txt

If you tell me how to get all concepts belong to income statement ,i can fulfill the job ,with the same way balance and cashflow statement all can be extracted.In my case the fdata contain all concepts belong to income,balance,cashflow statement,which concept in fdata belong to which financial statement? How to map every concepts into income,balance,cashflow statement?

#expression in pseudocode 
income_statement = fdata[all_concepts_belong_to_income_statement]
balance_statement = fdata[all_concepts_belong_to_balance_statement]
cashflow_statement = fdata[all_concepts_belong_to_cashflow_statement]


Solution 1:[1]

If I understand you correctly, the following should help you get least part of the way there.I'm not going to try to replicate the financial statements; instead, I'll only choose one, and show the concepts used in creating the statement - you'll have to take it from there.

I'll use the "income statement" (formally called in the instance document: "Consolidated Statements of Comprehensive Income (Loss) - USD ($) - $ in Millions") as the example. Again, the data is not in json, but html.

#import the necessary libraries
import lxml.html as lh
import requests

#get the filing
url = 'https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/0001564590-21-004599.txt'
req = requests.get(url, headers={"User-Agent": "b2g"})

#parse the filing content
doc = lh.fromstring(req.content)

#locate the relevant table and get the data
concepts = doc.xpath("//p[@id='Consolidated_Statmnts_of_Cmprehnsve_Loss']/following-sibling::div[1]//table//@name")
for c in set(concepts):
    print(c)

Output: There are 5 concepts, repeated 3 times each:

us-gaap:ComprehensiveIncomeNetOfTaxAttributableToNoncontrollingInterest
us-gaap:OtherComprehensiveIncomeLossForeignCurrencyTransactionAndTranslationAdjustmentNetOfTax
us-gaap:ProfitLoss
us-gaap:ComprehensiveIncomeNetOfTax
us-gaap:ComprehensiveIncomeNetOfTaxIncludingPortionAttributableToNoncontrollingInterest

Solution 2:[2]

I get a indirect way to extract income statement via SEC's api,SEC already publish all data extracted from raw xbrl file(called xbrl instance published also),the tool which parse xbrl file to form four csv files num.txt,sub.txt,pre.txt,tag.txt is not published,what i want to do is to create the tool myself.

Step1:

Download dataset from https://www.sec.gov/dera/data/financial-statement-data-sets.html on 2020,and unzip it ,we get pre.txt,num.txt,sub.txt,tag.txt.

Step2:

Create database and table pre,num,sub,tag according to fields in pre.txt,num.txt,sub.txt,tag.txt.

Step3:

Import pre.txt,num.txt,sub.txt,tag.txt into table pre,num,sub,tag.

Step4:

Query in my postgresql:

the adsh number for tesla's financial statement on 2020 is `0001564590-21-004599`

\set accno  '0001564590-21-004599'
select tag,value from num where adsh=:'accno' and ddate = '2020-12-31' and qtrs=4  and tag in 
(select tag from pre where adsh=:'accno' and stmt='IS'); 
                                             tag                                             |      value       
---------------------------------------------------------------------------------------------+------------------
 NetIncomeLoss                                                                               |   721000000.0000
 OperatingLeasesIncomeStatementLeaseRevenue                                                  |  1052000000.0000
 GrossProfit                                                                                 |  6630000000.0000
 InterestExpense                                                                             |   748000000.0000
 CostOfRevenue                                                                               | 24906000000.0000
 WeightedAverageNumberOfSharesOutstandingBasic                                               |   933000000.0000
 IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest |  1154000000.0000
 EarningsPerShareDiluted                                                                     |           0.6400
 ProfitLoss                                                                                  |   862000000.0000
 OperatingExpenses                                                                           |  4636000000.0000
 InvestmentIncomeInterest                                                                    |    30000000.0000
 OperatingIncomeLoss                                                                         |  1994000000.0000
 SellingGeneralAndAdministrativeExpense                                                      |  3145000000.0000
 NetIncomeLossAttributableToNoncontrollingInterest                                           |   141000000.0000
 StockholdersEquityNoteStockSplitConversionRatio1                                            |           5.0000
 NetIncomeLossAvailableToCommonStockholdersBasic                                             |   690000000.0000
 Revenues                                                                                    | 31536000000.0000
 IncomeTaxExpenseBenefit                                                                     |   292000000.0000
 OtherNonoperatingIncomeExpense                                                              |  -122000000.0000
 WeightedAverageNumberOfDilutedSharesOutstanding                                             |  1083000000.0000
 EarningsPerShareBasic                                                                       |           0.7400
 ResearchAndDevelopmentExpense                                                               |  1491000000.0000
 BuyOutOfNoncontrollingInterest                                                              |    31000000.0000
 CostOfAutomotiveLeasing                                                                     |   563000000.0000
 CostOfRevenuesAutomotive                                                                    | 20259000000.0000
 CostOfServicesAndOther                                                                      |  2671000000.0000
 SalesRevenueAutomotive                                                                      | 27236000000.0000
 SalesRevenueServicesAndOtherNet                                                             |  2306000000.0000
(28 rows)

stmt='IS':get tag in the income statement,ddate='2020-12-31':annual report on 2020 year.Please read SEC's data field definition for qtrs,you may know why to set qtrs=4 in the select command.

Make a comparison with https://www.nasdaq.com/market-activity/stocks/tsla/financials,show part of it,look at the 12/31/2020 column:

Period Ending:  12/31/2021  12/31/2020  12/31/2019  12/31/2018
Total Revenue   $53,823,000 $31,536,000 $24,578,000 $21,461,000
Cost of Revenue $40,217,000 $24,906,000 $20,509,000 $17,419,000
Gross Profit    $13,606,000 $6,630,000  $4,069,000  $4,042,000
Research and Development    $2,593,000  $1,491,000  $1,343,000  $1,460,000

All the number are equal,the terms in xbrl are:Revenues,CostOfRevenue,GrossProfit,ResearchAndDevelopmentExpense ,the respective terms in financial accounting concepts are:Total Revenue,Cost of Revenue,Gross Profit,Research and Development. What i get are right numbers.

Somedays later,i can find the direct way to parse raw xbrl file to get the income statement,i am not familiar with xbrl yet,wish stackoverflow community help me fulfill my goal.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jack Fleeting
Solution 2