'How to open AWS SOC PDF reports on gnu/linux?
Solution 1:[1]
It looks like they used PDF as a ZIP file. So, first step is to show what is inside the PDF with pdfdetach, what can be installed with the poppler-utils package in ubuntu (apt install poppler-utils).
So, if our file is called Service_Organization_Controls_(SOC)_2_Report_-_Current.pdf, you can do:
pdfdetach 'Service_Organization_Controls_(SOC)_2_Report_-_Current.pdf' -list
This will list the internal content, that looks like:
$ pdfdetach 'Service_Organization_Controls_(SOC)_2_Report_-_Current.pdf' -list
2 embedded files
1: Service Organization Controls (SOC) 2 Report - Current/AWS SOC 2 Report Apr-Sept 2020 - FINAL.pdf
2: Service Organization Controls (SOC) 2 Report - Current/SOC2 Excel Provided by AWS Apr-Sept 2020.pdf
so we can extract the first document, but pdfdetach doesn't work because the directory doesn't exist, so it is required to create it by hand:
mkdir Service Organization Controls (SOC) 2 Report - Current
pdfdetach 'Service_Organization_Controls_(SOC)_2_Report_-_Current.pdf' -saveall
and now you can open Service Organization Controls (SOC) 2 Report - Current/AWS SOC 2 Report Apr-Sept 2020 - FINAL.pdf as usual, but Service Organization Controls (SOC) 2 Report - Current/SOC2 Excel Provided by AWS Apr-Sept 2020.pdf is just a container again:
pdfdetach "Service Organization Controls (SOC) 2 Report - Current/SOC2 Excel Provided by AWS Apr-Sept 2020.pdf" -list
so you need to extract the real excel file, but this time it doesn't have spaces:
pdfdetach "Service Organization Controls (SOC) 2 Report - Current/SOC2 Excel Provided by AWS Apr-Sept 2020.pdf" -saveall
And you will get the excel file as well.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | MagMax |

