'How to check whether an English word is meaningful in Julia?

In Julia, how can I check an English word is a meaningful word? Suppose I want to know whether "Hello" is meaningful or not. In Python, one can use the enchant or nltk packages(Examples: [1],[2]). Is it possible to do this in Julia as well?

What I need is a function like this:

is_english("Hello")
>>>true

is_english("Hlo")
>>>false
# Because it doesn't have meaning! We don't have such a word in English terminology!

is_english("explicit")
>>>true

is_english("eeplicit")
>>>false

Here is what I've tried so far:
I have a dataset that contains frequent 5char English words(link to google drive). So I decided to augment it to my question for better understanding. Although this dataset is not adequate (because it just contains frequent 5char meaningful words, not all the meaningful English words with any length), it's suitable to use it to show what I want:

using CSV
using DataFrames
df = CSV.read("frequent_5_char_words.csv" , DataFrame , skipto=2)

df = [lowercase(item) for item in df[:,"0"]]
function is_english(word::String)::Bool
    return lowercase(word) in df
end

Then when I try these:

julia>is_english("Helo")
false

julia>is_english("Hello")
true

But I don't have an affluent dataset! So this isn't enough. So I'm curious if there are any packages like what I mentioned before, in Julia or not?



Solution 1:[1]

(not enough rep to post a comment!)

You can still use NLTK in Julia via PyCall. Or, as it seems you don't need an NLP tool but just a dictionary, you can use wiktionary to do some lookup or build the dataset.

Solution 2:[2]

There is a recently new package, Named LanguageDetect.jl. It does not return true/false, but a list of probabilities. You could define something like:

using LanguageDetect: detect

function is_english(text, threshold=0.8)
  langs = detect(text)
  for lang in langs
    if lang.language == "en"
      return lang.probability >= threshold
    end
  end
  ret



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Isia S.
Solution 2 longemen3000