'How to remove unwanted symbols from text file

I have some text files with unwanted symbols such as

?~, ?~@?, -?~, ?~H~Z, ?~@~S, ?~@~T, : ?~@~], ?, etc

enter image description here

The actual text:

~@~\SEPA for cards is the next logical step in European retail payments integration~@~], says Yves Mersch, Member of the Executive Board of the ECB.
~@~\Giuseppe Penone's tree conveys a sense of stability and growth and is rooted in the humanist values of Europe in the most beautiful way~@~], said Benoît C~Suré
, Member of the Executive Board of the ECB and chair of the art jury which selected the artwork.
~@~\The euro banknotes and coins in everyone~@~Ys wallets are the same in the whole euro area.
~@~\The introduction of the new ~B50 will make our currency even safer~@~], Yves Mersch, ECB Executive Board member, said.
~@~\The successful completion of SEPA further accelerates Europe~@~Ys financial integration~@~], said Yves Mersch, ECB Executive Board member.

The hex dump

00000000  e2 80 9c 53 45 50 41 20  66 6f 72 20 63 61 72 64  |...SEPA for card|
00000010  73 20 69 73 20 74 68 65  20 6e 65 78 74 20 6c 6f  |s is the next lo|
00000020  67 69 63 61 6c 20 73 74  65 70 20 69 6e 20 45 75  |gical step in Eu|
00000030  72 6f 70 65 61 6e 20 72  65 74 61 69 6c 20 70 61  |ropean retail pa|
00000040  79 6d 65 6e 74 73 20 69  6e 74 65 67 72 61 74 69  |yments integrati|
00000050  6f 6e e2 80 9d 2c 20 73  61 79 73 20 59 76 65 73  |on..., says Yves|
00000060  20 4d 65 72 73 63 68 2c  20 4d 65 6d 62 65 72 20  | Mersch, Member |
00000070  6f 66 20 74 68 65 20 45  78 65 63 75 74 69 76 65  |of the Executive|
00000080  20 42 6f 61 72 64 20 6f  66 20 74 68 65 20 45 43  | Board of the EC|
00000090  42 2e 0a e2 80 9c 54 68  65 20 73 75 63 63 65 73  |B.....The succes|
000000a0  73 66 75 6c 20 63 6f 6d  70 6c 65 74 69 6f 6e 20  |sful completion |
000000b0  6f 66 20 53 45 50 41 20  66 75 72 74 68 65 72 20  |of SEPA further |
000000c0  61 63 63 65 6c 65 72 61  74 65 73 20 45 75 72 6f  |accelerates Euro|
000000d0  70 65 e2 80 99 73 20 66  69 6e 61 6e 63 69 61 6c  |pe...s financial|
000000e0  20 69 6e 74 65 67 72 61  74 69 6f 6e e2 80 9d 2c  | integration...,|
000000f0  20 73 61 69 64 20 59 76  65 73 20 4d 65 72 73 63  | said Yves Mersc|
00000100  68 2c 20 45 43 42 20 45  78 65 63 75 74 69 76 65  |h, ECB Executive|
00000110  20 42 6f 61 72 64 20 6d  65 6d 62 65 72 2e 0a     | Board member..|
0000011f

There are a lot of them. They show up whenever I use vim. I have tried using sed to remove them

sed -i 's#?~@?##g' file.txt

But it did not work. What are those symbols? How do I remove them either with bash or python?



Solution 1:[1]

Use iconv.

iconv -f UTF-8 -t ASCII test.txt

UTF-8 as input format is a guess, but as @Fravadona and @Jesujoba ALABI pointed out in the comments correct.

Note 4/5/2022 13:13: The test input file test.txt I made from the hexdump was malformed (endianness?), but the -c option discards any unknown input so it worked. Removed the -c option.

Output:

SEPA for cards is the next logical step in European retail payments integration, says Yves Mersch, Member of the Executive Board of the ECB.
The successful completion of SEPA further accelerates Europes financial integration, said Yves Mersch, ECB Executive Board member.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1