'How to use sed to append to href elements that don't end with a given file extension
I am in the process of migrating content from a web server to an application that stores documents in a different directory than it does the php files. I am trying to figure out how to append a new directory to the beginning of all href elements that do not end with php, but I am having trouble wrapping my head around how I might do that. For example, given the input:
<p>This is a <a href="/ir/factbooks/file1.php">PHP file</a> and this is a <a href="/ir/factbooks/file2-needs-replaced.pdf">PDF file</a>.</p>
I need to get the following output:
<p>This is a <a href="/ir/factbooks/file1.php">PHP file</a> and this is a <a href="/documents/ir/factbooks/file2-needs-replaced.pdf">PDF file</a>.</p>
Solution 1:[1]
You can't negate a string in a BRE or ERE as used by sed so the general approach to your problem is to match the string that you want ignored, temporarily change part of it (I'm using href below) to something else that can't exist in the input, e.g. a newline \n, so it no longer matches the regexp you do want to change, then change the part you do want, then change that temporary "something else" back to what it started with.
Tested with GNU sed, may also work with BSD sed (the other one that accepts -E to enable EREs) if it also allows \n to mean newline in both the rexexp and replacement positions:
$ sed -E 's:href(="[^"]*\.php"):\n\1:g; s:(href=")([^"]*"):\1/documents\2:g; s/\n/href/g' file
<p>This is a <a href="/ir/factbooks/file1.php">PHP file</a> and this is a <a href="/documents/ir/factbooks/file2-needs-replaced.pdf">PDF file</a>.</p>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
