'Decrypting a GPG-encrypted file in Python with raw key
I'm trying to use Python to decrypt a GPG-encrypted file using the raw key. Not the passphrase, not a nicely formatted file from a keyring, just the literal raw bytes of the key that the file was encrypted with.
I first created a test file:
~$ echo "It would be really cool if this worked" >> PGPDecryptorTest1.txt
I then encrypted the file using AES256 with the passphrase "a" and SHA256 key derivation:
~$ gpg --symmetric --s2k-mode 0 --s2k-digest-algo SHA256 --cipher-algo AES256 PGPDecryptorTest1.txt
I wrote the following short script to decode the file:
import sys
from Crypto.Cipher import AES
# With s2k-mode 0 specified, key is just SHA256 hash of passphrase
hash_a = b"\xca\x97\x81\x12\xca\x1b\xbd\xca\xfa\xc2\x31\xb3\x9a\x23\xdc\x4d\xa7\x86\xef\xf8\x14\x7c\x4e\x72\xb9\x80\x77\x85\xaf\xee\x48\xbb"
key = hash_a
def main(filename):
with open(filename, "rb") as f:
# First 9 bytes are header, ignore them and read the rest
contents = f.read()[9:]
# IV is size (block size + 2)
# AES uses 16-byte (128-bit) blocks
# Last two bytes are for checksum
iv = contents[0:16 + 2]
# Rest of contents should be ciphertext
ciphertext = contents[16 + 2:]
# Use openPGP special cipher mode
cipher = AES.new(key, AES.MODE_OPENPGP, iv=iv)
plaintext = cipher.decrypt(ciphertext)
print("Output: " + str(plaintext))
if __name__ == "__main__":
if (len(sys.argv) > 1):
main(sys.argv[1])
else:
main(input("Please specify an input file: "))
However, the output for this program is unintelligible garbage.
~$ python3 PGPDecryptor.py PGPDecryptorTest1.txt.gpg
Output: b'\x11\xd6\xf4\x8d\xf7/o.\x13k#D\xd1!\xce\xf5\xf9\xd9\x0b,\xdb\xe4\xd6,\xb8\x80\xcb2N\xd1^\x96\x8chP\xfb\xb0?Z\xb2\xed?\xce==\xfb9\xcf5o{\xb6\x12\xf3\xf7\xc9QC\xc3\xb5\xe4\x95ab?\x17\x9d\xd3\xd3\xc6\xa8j#K\x8cMf\xc6\x00V\x89Y\xe2\xe7~\xc4B\xd5\x1b\x8f\xe9&t'
I have verified the key by other methods, so I'm confident that it's correct. I must be very close to a proper solution, because changing either the IV or the key even slightly causes the following error to appear:
ValueError: Failed integrity check for OPENPGP IV
This suggests that I'm getting the key and IV correct. I've tried a nested for loop to try every valid combination of start and end indices for the ciphertext, just in case there was some additional garbage/header data somewhere, but with equally useless output for every combination.
If anyone can tell me what I'm doing wrong/how to correct it, I'd be very grateful. I suspect the error is very simple, but the nature of the problem makes it difficult to troubleshoot.
I currently have a janky alternative solution that involves modifying the pgpy library, but my problem with this is that importing large files to process (~500MB) takes a long time (~20-30 minutes). I looked at gnupg as well, but it's just a wrapper--it can decrypt with passphrases, but not with raw keys.
Solution 1:[1]
Using AES.MODE_OPENPGP would probably work for a Symmetrically Encrypted Data packet (tag 9), as it simply contains the encrypted data (reference).
However, that's not what you've produced with your gpg invocation. To get some insight into what we're actually dealing with, you can use the --list-packets command:
$ gpg --list-packets --verbose PGPDecryptorTest1.txt.gpg
gpg: AES256.CFB encrypted data
gpg: pinentry launched (90945 curses 1.1.0 /dev/pts/6 screen -)
gpg: encrypted with 1 passphrase
# off=0 ctb=8c tag=3 hlen=2 plen=4
:symkey enc packet: version 4, cipher 9, aead 0,s2k 0, hash 8
# off=6 ctb=d2 tag=18 hlen=2 plen=112 new-ctb
:encrypted data packet:
length: 112
mdc_method: 2
# off=27 ctb=a3 tag=8 hlen=1 plen=0 indeterminate
:compressed packet: algo=1
# off=29 ctb=ac tag=11 hlen=2 plen=66
:literal data packet:
mode b (62), created 1652044966, name="PGPDecryptorTest1.txt",
raw data: 39 bytes
Two things of note:
- The encrypted data packet is tag 18, which is a Symmetrically Encrypted Integrity Protected Data packet. We're no longer dealing with only the output of the cipher, but data preceded with a version # and suffixed with with a Modification Detection Code packet (reference).
- The encrypted data packets contents are compressed.
WARNING: The code below is just a rough demonstration of poking around OpenPGP message format. It is brittle and shouldn't be reused. The main takeaway is that reliably parsing OpenPGP messages isn't trivial and you should use a well tested library.
The main references I used are:
- RFC 4880 - OpenPGP Message Format
- GPGLib2 (Another pure Python GPG implementation that might be worth testing)
To more easily demonstrate digging into this, I've produced an encrypted message with compression turned off:
$ cat PGPDecryptorTest1.txt
It would be really cool if this worked
$ gpg --symmetric -o PGPDecryptorTest1.txt.uncompressed.gpg --compress-level 0 --s2k-mode 0 --s2k-digest-algo SHA256 --cipher-algo AES256 PGPDecryptorTest1.txt
gpg: Note: simple S2K mode (0) is strongly discouraged
$ python3 solution.py PGPDecryptorTest1.txt.uncompressed.gpg
contents(len: 117): b'8c0404090008d26d017795712a4686d1a176a0f150a33b9c972d876948df739b1058a513f916ef8094c80ae65ed022c30e1108d20dbeaeee70285e8736e8184520ceb0c435feafdd856051eb166e96e32e82ba51a3af4d230174e97a8f3a3529606b6558fce716bf3b0e9b856d442f5104f3647af0'
decrypted_iv(len: 16): b'49355c68e8e3eba7cc5ccb529d158a2c'
first_block(len: 16): b'8a2cac42621550475044656372797074'
decrypted_data (first block)(len: 14): b'ac42621550475044656372797074'
decrypted_data(len: 68): b'ac426215504750446563727970746f7254657374312e74787462783732497420776f756c64206265207265616c6c7920636f6f6c206966207468697320776f726b65640a'
plaintext(len: 39): b'497420776f756c64206265207265616c6c7920636f6f6c206966207468697320776f726b65640a'
It would be really cool if this worked
Here's the solution implementation:
import binascii
import hashlib
import sys
from Cryptodome.Cipher import AES
def print_bytes(name, data):
print("%s(len: %d): %s" % (name, len(data), str(binascii.hexlify(data))))
def main(filename):
# Generate key material from the passphrase.
passphrase = b"a"
m = hashlib.sha256()
m.update(passphrase)
key = m.digest()
# Get file contents.
with open(filename, "rb") as f:
contents = f.read()
print_bytes("contents", contents)
# Constants
header_len = 9 # including the 1-octet type-19 version identifier
block_size = 16 # alogorithm details should normally be extracted from the header
segment_size = block_size * 8
iv_len = block_size
iv_tag_len = 2
mdc_len = 22
# "Manually" decrypting to adhere to
# https://datatracker.ietf.org/doc/html/rfc4880#section-5.13
# Doing it this way helps with the integrity check, which I ended
# skipping.
cipher = AES.new(key, AES.MODE_CFB, iv=b"\x00" * block_size, segment_size=segment_size)
offset = header_len
decrypted_iv = cipher.decrypt(contents[offset:offset+block_size])
print_bytes("decrypted_iv", decrypted_iv)
decrypted_data = bytearray()
offset += block_size
first_block = cipher.decrypt(contents[offset:offset+block_size])
print_bytes("first_block", first_block)
offset += block_size
if first_block[:2] != decrypted_iv[-2:]:
print("IV check failed")
sys.exit(1)
decrypted_data.extend(first_block[2:])
print_bytes("decrypted_data (first block)", decrypted_data)
padding = block_size - (len(contents)-offset) % block_size
contents += b"\x00" * padding
decrypted_data.extend(cipher.decrypt(contents[offset:]))
# Here is where you should parse the MDC packet and use it for integrity
# checking. Instead, skipping the check and discarding the packet for
# brevity.
decrypted_data = decrypted_data[:-(mdc_len+padding)]
print_bytes("decrypted_data", decrypted_data)
# Extract filename length so we can find the plaintext offset.
# see https://datatracker.ietf.org/doc/html/rfc4880#section-5.9
filename_len = decrypted_data[3]
plaintext_offset = (
2 # header
+ 1 # file type
+ 1 # filename length
+ filename_len # filename contents
+ 4 # timestamp
)
plaintext = decrypted_data[plaintext_offset:]
print_bytes("plaintext", plaintext)
print(plaintext.decode())
if __name__ == "__main__":
if (len(sys.argv) > 1):
main(sys.argv[1])
else:
print("first argument must be file to decrypt")
sys.exit(1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | chuckx |
