'How to get vocabulary after Sklearn ColumnTransformer
I want to get vocabulary after ColumnTransformer
This is my code:
features = df[["content", "numeric1", "numeric2"]]
results = df["label"]
features = features.to_numpy()
results = results.to_numpy()
# Creating vectorizer
transformerVectoriser = ColumnTransformer(transformers=[('vector_char', TfidfVectorizer(analyzer='char', ngram_range=(2, 6), max_features = 2500, lowercase = True), 0),
('vector_word_1', TfidfVectorizer(analyzer='word', ngram_range=(1, 1), max_features = 10000, lowercase = True), 0),
('vector_word_2', TfidfVectorizer(analyzer='word', ngram_range=(2, 2), max_features = 4500, lowercase = True), 0),
('vector_word_3', TfidfVectorizer(analyzer='word', ngram_range=(3, 3), max_features = 750, lowercase = True), 0)],
remainder='passthrough'
)
print(transformerVectoriser.vocabulary_)
Im getting this error:
AttributeError: 'ColumnTransformer' object has no attribute 'vocabulary_'
I have also tried this:
features = transformerVectoriser.fit_transform(features)
print(features.vocabulary_)
But I'm getting this error:
raise AttributeError(attr + " not found")
AttributeError: vocabulary_ not found
I have also tried this:
transformerVectoriser.fit(features)
print("Stem vocabulary:")
print(transformerVectoriser.transformers_[0].vocabulary_)
Error: AttributeError: 'tuple' object has no attribute 'vocabulary_'
And this:
transformed_features = transformerVectoriser.fit_transform(features)
print("Stem vocabulary:")
print(transformed_features.transformers_[0].vocabulary_)
Error: AttributeError: transformers_ not found
Solution 1:[1]
Each of the four individual transformers in your ColumnTransformer has its own vocabulary. You can access the four transformers via transformerVectoriser.transformers_, ie
transformerVectoriser = ColumnTransformer(transformers=[('vector_char', TfidfVectorizer(analyzer='char', ngram_range=(2, 6), max_features = 2500, lowercase = True), 0),
('vector_word_1', TfidfVectorizer(analyzer='word', ngram_range=(1, 1), max_features = 10000, lowercase = True), 0),
('vector_word_2', TfidfVectorizer(analyzer='word', ngram_range=(2, 2), max_features = 4500, lowercase = True), 0),
('vector_word_3', TfidfVectorizer(analyzer='word', ngram_range=(3, 3), max_features = 750, lowercase = True), 0)],
remainder='passthrough'
)
transformerVectoriser.fit(features)
# or transformed_features = transformerVectoriser.fit_transform(features)
print("Stem vocabulary:")
print(transformerVectoriser.transformers_[0][1].vocabulary_)
print("~~")
print("Word vocabulary:")
print(transformerVectoriser.transformers_[1][1].vocabulary_)
print("~~")
print("Bigram vocabulary:")
print(transformerVectoriser.transformers_[2][1].vocabulary_)
print("~~")
print("Trigram vocabulary:")
print(transformerVectoriser.transformers_[3][1].vocabulary_)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
