'How to remove unwanted symbols from text file
I have some text files with unwanted symbols such as
?~, ?~@?, -?~, ?~H~Z, ?~@~S, ?~@~T, : ?~@~], ?, etc
The actual text:
~@~\SEPA for cards is the next logical step in European retail payments integration~@~], says Yves Mersch, Member of the Executive Board of the ECB.
~@~\Giuseppe Penone's tree conveys a sense of stability and growth and is rooted in the humanist values of Europe in the most beautiful way~@~], said Benoît C~Suré
, Member of the Executive Board of the ECB and chair of the art jury which selected the artwork.
~@~\The euro banknotes and coins in everyone~@~Ys wallets are the same in the whole euro area.
~@~\The introduction of the new ~B50 will make our currency even safer~@~], Yves Mersch, ECB Executive Board member, said.
~@~\The successful completion of SEPA further accelerates Europe~@~Ys financial integration~@~], said Yves Mersch, ECB Executive Board member.
The hex dump
00000000 e2 80 9c 53 45 50 41 20 66 6f 72 20 63 61 72 64 |...SEPA for card|
00000010 73 20 69 73 20 74 68 65 20 6e 65 78 74 20 6c 6f |s is the next lo|
00000020 67 69 63 61 6c 20 73 74 65 70 20 69 6e 20 45 75 |gical step in Eu|
00000030 72 6f 70 65 61 6e 20 72 65 74 61 69 6c 20 70 61 |ropean retail pa|
00000040 79 6d 65 6e 74 73 20 69 6e 74 65 67 72 61 74 69 |yments integrati|
00000050 6f 6e e2 80 9d 2c 20 73 61 79 73 20 59 76 65 73 |on..., says Yves|
00000060 20 4d 65 72 73 63 68 2c 20 4d 65 6d 62 65 72 20 | Mersch, Member |
00000070 6f 66 20 74 68 65 20 45 78 65 63 75 74 69 76 65 |of the Executive|
00000080 20 42 6f 61 72 64 20 6f 66 20 74 68 65 20 45 43 | Board of the EC|
00000090 42 2e 0a e2 80 9c 54 68 65 20 73 75 63 63 65 73 |B.....The succes|
000000a0 73 66 75 6c 20 63 6f 6d 70 6c 65 74 69 6f 6e 20 |sful completion |
000000b0 6f 66 20 53 45 50 41 20 66 75 72 74 68 65 72 20 |of SEPA further |
000000c0 61 63 63 65 6c 65 72 61 74 65 73 20 45 75 72 6f |accelerates Euro|
000000d0 70 65 e2 80 99 73 20 66 69 6e 61 6e 63 69 61 6c |pe...s financial|
000000e0 20 69 6e 74 65 67 72 61 74 69 6f 6e e2 80 9d 2c | integration...,|
000000f0 20 73 61 69 64 20 59 76 65 73 20 4d 65 72 73 63 | said Yves Mersc|
00000100 68 2c 20 45 43 42 20 45 78 65 63 75 74 69 76 65 |h, ECB Executive|
00000110 20 42 6f 61 72 64 20 6d 65 6d 62 65 72 2e 0a | Board member..|
0000011f
There are a lot of them. They show up whenever I use vim. I have tried using sed to remove them
sed -i 's#?~@?##g' file.txt
But it did not work.
What are those symbols? How do I remove them either with bash or python?
Solution 1:[1]
Use iconv.
iconv -f UTF-8 -t ASCII test.txt
UTF-8 as input format is a guess, but as @Fravadona and @Jesujoba ALABI pointed out in the comments correct.
Note 4/5/2022 13:13: The test input file test.txt I made from the hexdump was malformed (endianness?), but the -c option discards any unknown input so it worked. Removed the -c option.
Output:
SEPA for cards is the next logical step in European retail payments integration, says Yves Mersch, Member of the Executive Board of the ECB.
The successful completion of SEPA further accelerates Europes financial integration, said Yves Mersch, ECB Executive Board member.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

