'How to remove sensitive data from a file in github history

I am using a shared github repository to collaborate on a project. Because i am an idiot, I committed and pushed a script file containing a password which I don't want to share (Yes, i can change the password, but I would like to remove it anyway!).

Is there any way to revert the commits from github's history, remove the password locally and then recommit and push the updated files? I do not want to remove the file completely, and I would rather not lose the commit history on github.

(This question How can I completely remove a file from a git repository? shows how to remove a sensitive file, but not how to edit sensitive data from a file, so this is not a duplicate)



Solution 1:[1]

I would recommend to use the new git filter-repo, which replaces BFG and git filter-branch.

Note: if you get the following error message when running the above-mentioned commands:

Error: need a version of `git` whose `diff-tree` command has the `--combined-all-paths` option`

it means you have to update git.


First: do that one copy of your local repo (a new clone)

See "Content base filtering":

At the end, you can (if you are the only one working on that repository) do a git push --force

If you want to modify file contents, you can do so based on a list of expressions in a file, one per line.
For example, with a file named expressions.txt containing:

p455w0rd
foo==>bar
glob:*666*==>
regex:\bdriver\b==>pilot
literal:MM/DD/YYYY==>YYYY-MM-DD
regex:([0-9]{2})/([0-9]{2})/([0-9]{4})==>\3-\1-\2

then running

git filter-repo --replace-text expressions.txt

will go through and replace:

  • p455w0rd with ***REMOVED***,
  • foo with bar,
  • any line containing 666 with a blank line,
  • the word driver with pilot (but not if it has letters before or after; e.g. drivers will be unmodified),
  • the exact text MM/DD/YYYY with YYYY-MM-DD and
  • date strings of the form MM/DD/YYYY with ones of the form YYYY-MM-DD.

Solution 2:[2]

Use BFG : https://rtyley.github.io/bfg-repo-cleaner/

To remove files:

$ bfg --delete-files <file to remove>  my-repo.git

enter image description here


You can also use this tool to remove passwords and ant sensitive data as well.

Prepare a replacement file with the content you wish to replace and use BFG to clean it out.

bfg --replace-text passwords.txt  my-repo.git

# Example of the passwords.txt file: 
string1                   # Replace string ***REMOVED***' (default text)
string2==>replacementText # replace with 'replacementText' instead
string3=>                 # replace with the empty string

Solution 3:[3]

If your content had already been pushed to GitHub, after scrubbing the repository with git filter-repo or bfg and force-pushing the cleaned up repository, reach out to GitHub Support. They will then make sure all references to the commit and it's files are deleted from issue references, pull requests and the cached data GitHub keeps. Only then the password will really be gone from your repositories.

If anyone forked your repository and synced in the sensitive commit, then there is no way to force GitHub to clean up their repositories too. You'll need to ask each owner of the forks to go through the same process.

Consider your password burned. Since your password is out there and since it will take some time to be fully removed, there is ample time for a bad actor to scrape your current repo state and store the password for later use. Always reset the password. Do not fall for the trap of thinking you may still be safe.

Make sure any other contributors on your project clone a fresh copy or rebase there local changes on the fixed repository. Removing data from history will cause the commit-ids of all subsequent commits to change.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3