'Spark - createDataFrame returns NPE
I'm trying to run these lines :
dsFinalSegRfm.show(20, false);
Long compilationTime = System.currentTimeMillis() / 1000;
JavaRDD<CustomerKnowledgeEntity> customerKnowledgeList = dsFinalSegRfm.javaRDD().map(
(Function<Row, CustomerKnowledgeEntity>) rowRfm -> {
CustomerKnowledgeEntity customerKnowledge = new CustomerKnowledgeEntity();
customerKnowledge.setCustomerId(new Long(getString(rowRfm.getAs("CLI_ID"))));
customerKnowledge.setKnowledgeType("rfm-segmentation");
customerKnowledge.setKnowledgeTypeId("default");
InformationsEntity infos = new InformationsEntity();
infos.setCreationDate(new Date());
infos.setModificationDate(new Date());
infos.setUserModification("addKnowledge");
customerKnowledge.setInformations(infos);
List<KnowledgeEntity> knowledgeEntityList = new ArrayList<>();
List<WrappedArray<String>> segList = rowRfm.getList(rowRfm.fieldIndex("SEGS"));
for (WrappedArray<String> seg : segList) {
KnowledgeEntity knowledge = new KnowledgeEntity();
Map<String, Object> attr = new HashMap<>();
attr.put("segment", seg.apply(1));
attr.put("segmentSemester", seg.apply(2));
knowledge.setKnowledgeId(seg.apply(0));
knowledge.setAttributes(attr);
knowledge.setPriority(0);
knowledge.setCount(1);
knowledge.setDeleted(false);
knowledgeEntityList.add(knowledge);
}
customerKnowledge.setKnowledgeCollections(knowledgeEntityList);
return customerKnowledge;
});
Long dataConstructionTime = System.currentTimeMillis() / 1000;
Dataset<Row> dataset = sparkSession
.createDataFrame(customerKnowledgeList, CustomerKnowledgeEntity.class)
.repartition(16)
.cache();
The dsFinalSegRfm.show(20, false); returns what I expect :
But I'm getting a Null Pointer Exception from createDataFrame method.
I'm learning Spark but I find it very opaque for debugging... Any help is appreciated !
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

