'How can extract an income statement from all company concepts?
All company concepts in xbrl format can be extracted with sec's RESTful api. For example,i want to get tesla's concepts in xbrl format in 2020, get the tesla's cik and the url for api.
cik='1318605'
url = 'https://data.sec.gov/api/xbrl/companyfacts/CIK{:>010s}.json'.format(cik)
To express financial statement in 2020 with elements fy and fp:
'fy' == 2020 and 'fp' == 'FY'
I write the whole python code to call sec's api:
import requests
import json
cik='1318605'
url = 'https://data.sec.gov/api/xbrl/companyfacts/CIK{:>010s}.json'.format(cik)
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36"
}
res = requests.get(url=url,headers=headers)
result = json.loads(res.text)
concepts_list = list(result['facts']['us-gaap'].keys())
data_concepts = result['facts']['us-gaap']
fdata = {}
for item in concepts_list:
data = data_concepts[item]['units']
data_units = list(data_concepts[item]['units'].keys())
for data_units_attr in data_units:
for record in data[data_units_attr]:
if record['fy'] == 2020 and record['fp'] == 'FY':
fdata[item] = record['val']
fdata contains all the company concepts and its value in 2020 for tesla,show part of it:
fdata
{'AccountsAndNotesReceivableNet': 334000000,
'AccountsPayableCurrent': 6051000000,
'AccountsReceivableNetCurrent': 1886000000,
How can get all concepts below to income statement?I want to extract it to make an income statement.
Maybe i should add some values in dei with almost same way as above.
EntityCommonStockSharesOutstanding:959853504
EntityPublicFloat:160570000000
It is simple to parse financial statement such as income statement from ixbrl file:
https://www.sec.gov/ix?doc=/Archives/edgar/data/0001318605/000156459021004599/tsla-10k_20201231.htm
I can get it ,please help to replicate the annual income statement on 2020 for Tesla from sec's RESTful api,or extract the income statement from the whole instance file:
https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/0001564590-21-004599.txt
If you tell me how to get all concepts belong to income statement ,i can fulfill the job ,with the same way balance and cashflow statement all can be extracted.In my case the fdata contain all concepts belong to income,balance,cashflow statement,which concept in fdata belong to which financial statement? How to map every concepts into income,balance,cashflow statement?
#expression in pseudocode
income_statement = fdata[all_concepts_belong_to_income_statement]
balance_statement = fdata[all_concepts_belong_to_balance_statement]
cashflow_statement = fdata[all_concepts_belong_to_cashflow_statement]
Solution 1:[1]
If I understand you correctly, the following should help you get least part of the way there.I'm not going to try to replicate the financial statements; instead, I'll only choose one, and show the concepts used in creating the statement - you'll have to take it from there.
I'll use the "income statement" (formally called in the instance document: "Consolidated Statements of Comprehensive Income (Loss) - USD ($) - $ in Millions") as the example. Again, the data is not in json, but html.
#import the necessary libraries
import lxml.html as lh
import requests
#get the filing
url = 'https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/0001564590-21-004599.txt'
req = requests.get(url, headers={"User-Agent": "b2g"})
#parse the filing content
doc = lh.fromstring(req.content)
#locate the relevant table and get the data
concepts = doc.xpath("//p[@id='Consolidated_Statmnts_of_Cmprehnsve_Loss']/following-sibling::div[1]//table//@name")
for c in set(concepts):
print(c)
Output: There are 5 concepts, repeated 3 times each:
us-gaap:ComprehensiveIncomeNetOfTaxAttributableToNoncontrollingInterest
us-gaap:OtherComprehensiveIncomeLossForeignCurrencyTransactionAndTranslationAdjustmentNetOfTax
us-gaap:ProfitLoss
us-gaap:ComprehensiveIncomeNetOfTax
us-gaap:ComprehensiveIncomeNetOfTaxIncludingPortionAttributableToNoncontrollingInterest
Solution 2:[2]
I get a indirect way to extract income statement via SEC's api,SEC already publish all data extracted from raw xbrl file(called xbrl instance published also),the tool which parse xbrl file to form four csv files num.txt,sub.txt,pre.txt,tag.txt is not published,what i want to do is to create the tool myself.
Step1:
Download dataset from https://www.sec.gov/dera/data/financial-statement-data-sets.html on 2020,and unzip it ,we get pre.txt,num.txt,sub.txt,tag.txt.
Step2:
Create database and table pre,num,sub,tag according to fields in pre.txt,num.txt,sub.txt,tag.txt.
Step3:
Import pre.txt,num.txt,sub.txt,tag.txt into table pre,num,sub,tag.
Step4:
Query in my postgresql:
the adsh number for tesla's financial statement on 2020 is `0001564590-21-004599`
\set accno '0001564590-21-004599'
select tag,value from num where adsh=:'accno' and ddate = '2020-12-31' and qtrs=4 and tag in
(select tag from pre where adsh=:'accno' and stmt='IS');
tag | value
---------------------------------------------------------------------------------------------+------------------
NetIncomeLoss | 721000000.0000
OperatingLeasesIncomeStatementLeaseRevenue | 1052000000.0000
GrossProfit | 6630000000.0000
InterestExpense | 748000000.0000
CostOfRevenue | 24906000000.0000
WeightedAverageNumberOfSharesOutstandingBasic | 933000000.0000
IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest | 1154000000.0000
EarningsPerShareDiluted | 0.6400
ProfitLoss | 862000000.0000
OperatingExpenses | 4636000000.0000
InvestmentIncomeInterest | 30000000.0000
OperatingIncomeLoss | 1994000000.0000
SellingGeneralAndAdministrativeExpense | 3145000000.0000
NetIncomeLossAttributableToNoncontrollingInterest | 141000000.0000
StockholdersEquityNoteStockSplitConversionRatio1 | 5.0000
NetIncomeLossAvailableToCommonStockholdersBasic | 690000000.0000
Revenues | 31536000000.0000
IncomeTaxExpenseBenefit | 292000000.0000
OtherNonoperatingIncomeExpense | -122000000.0000
WeightedAverageNumberOfDilutedSharesOutstanding | 1083000000.0000
EarningsPerShareBasic | 0.7400
ResearchAndDevelopmentExpense | 1491000000.0000
BuyOutOfNoncontrollingInterest | 31000000.0000
CostOfAutomotiveLeasing | 563000000.0000
CostOfRevenuesAutomotive | 20259000000.0000
CostOfServicesAndOther | 2671000000.0000
SalesRevenueAutomotive | 27236000000.0000
SalesRevenueServicesAndOtherNet | 2306000000.0000
(28 rows)
stmt='IS':get tag in the income statement,ddate='2020-12-31':annual report on 2020 year.Please read SEC's data field definition for qtrs,you may know why to set qtrs=4 in the select command.
Make a comparison with https://www.nasdaq.com/market-activity/stocks/tsla/financials,show part of it,look at the 12/31/2020 column:
Period Ending: 12/31/2021 12/31/2020 12/31/2019 12/31/2018
Total Revenue $53,823,000 $31,536,000 $24,578,000 $21,461,000
Cost of Revenue $40,217,000 $24,906,000 $20,509,000 $17,419,000
Gross Profit $13,606,000 $6,630,000 $4,069,000 $4,042,000
Research and Development $2,593,000 $1,491,000 $1,343,000 $1,460,000
All the number are equal,the terms in xbrl are:Revenues,CostOfRevenue,GrossProfit,ResearchAndDevelopmentExpense ,the respective terms in financial accounting concepts are:Total Revenue,Cost of Revenue,Gross Profit,Research and Development. What i get are right numbers.
Somedays later,i can find the direct way to parse raw xbrl file to get the income statement,i am not familiar with xbrl yet,wish stackoverflow community help me fulfill my goal.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jack Fleeting |
| Solution 2 |
