'How to remove sensitive data from a file in github history
I am using a shared github repository to collaborate on a project. Because i am an idiot, I committed and pushed a script file containing a password which I don't want to share (Yes, i can change the password, but I would like to remove it anyway!).
Is there any way to revert the commits from github's history, remove the password locally and then recommit and push the updated files? I do not want to remove the file completely, and I would rather not lose the commit history on github.
(This question How can I completely remove a file from a git repository? shows how to remove a sensitive file, but not how to edit sensitive data from a file, so this is not a duplicate)
Solution 1:[1]
I would recommend to use the new git filter-repo, which replaces BFG and git filter-branch.
Note: if you get the following error message when running the above-mentioned commands:
Error: need a version of `git` whose `diff-tree` command has the `--combined-all-paths` option`
it means you have to update git.
First: do that one copy of your local repo (a new clone)
See "Content base filtering":
At the end, you can (if you are the only one working on that repository) do a git push --force
If you want to modify file contents, you can do so based on a list of expressions in a file, one per line.
For example, with a file namedexpressions.txtcontaining:p455w0rd foo==>bar glob:*666*==> regex:\bdriver\b==>pilot literal:MM/DD/YYYY==>YYYY-MM-DD regex:([0-9]{2})/([0-9]{2})/([0-9]{4})==>\3-\1-\2then running
git filter-repo --replace-text expressions.txtwill go through and replace:
p455w0rdwith***REMOVED***,foowithbar,- any line containing
666with a blank line,- the word
driverwithpilot(but not if it has letters before or after; e.g. drivers will be unmodified),- the exact text
MM/DD/YYYYwithYYYY-MM-DDand- date strings of the form
MM/DD/YYYYwith ones of the formYYYY-MM-DD.
Solution 2:[2]
Use BFG : https://rtyley.github.io/bfg-repo-cleaner/
To remove files:
$ bfg --delete-files <file to remove> my-repo.git
You can also use this tool to remove passwords and ant sensitive data as well.
Prepare a replacement file with the content you wish to replace and use BFG to clean it out.
bfg --replace-text passwords.txt my-repo.git
# Example of the passwords.txt file:
string1 # Replace string ***REMOVED***' (default text)
string2==>replacementText # replace with 'replacementText' instead
string3=> # replace with the empty string
Solution 3:[3]
If your content had already been pushed to GitHub, after scrubbing the repository with git filter-repo or bfg and force-pushing the cleaned up repository, reach out to GitHub Support. They will then make sure all references to the commit and it's files are deleted from issue references, pull requests and the cached data GitHub keeps. Only then the password will really be gone from your repositories.
If anyone forked your repository and synced in the sensitive commit, then there is no way to force GitHub to clean up their repositories too. You'll need to ask each owner of the forks to go through the same process.
Consider your password burned. Since your password is out there and since it will take some time to be fully removed, there is ample time for a bad actor to scrape your current repo state and store the password for later use. Always reset the password. Do not fall for the trap of thinking you may still be safe.
Make sure any other contributors on your project clone a fresh copy or rebase there local changes on the fixed repository. Removing data from history will cause the commit-ids of all subsequent commits to change.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 |

