'Spark - createDataFrame returns NPE

I'm trying to run these lines :

dsFinalSegRfm.show(20, false);

Long compilationTime = System.currentTimeMillis() / 1000;

JavaRDD<CustomerKnowledgeEntity> customerKnowledgeList = dsFinalSegRfm.javaRDD().map(
    (Function<Row, CustomerKnowledgeEntity>) rowRfm -> {
        CustomerKnowledgeEntity customerKnowledge = new CustomerKnowledgeEntity();

        customerKnowledge.setCustomerId(new Long(getString(rowRfm.getAs("CLI_ID"))));
        customerKnowledge.setKnowledgeType("rfm-segmentation");
        customerKnowledge.setKnowledgeTypeId("default");

        InformationsEntity infos = new InformationsEntity();
        infos.setCreationDate(new Date());
        infos.setModificationDate(new Date());
        infos.setUserModification("addKnowledge");
        customerKnowledge.setInformations(infos);

        List<KnowledgeEntity> knowledgeEntityList = new ArrayList<>();
        List<WrappedArray<String>> segList = rowRfm.getList(rowRfm.fieldIndex("SEGS"));
        for (WrappedArray<String> seg : segList) {
            KnowledgeEntity knowledge = new KnowledgeEntity();
            Map<String, Object> attr = new HashMap<>();

            attr.put("segment", seg.apply(1));
            attr.put("segmentSemester", seg.apply(2));

            knowledge.setKnowledgeId(seg.apply(0));
            knowledge.setAttributes(attr);
            knowledge.setPriority(0);
            knowledge.setCount(1);
            knowledge.setDeleted(false);

            knowledgeEntityList.add(knowledge);
        }
        customerKnowledge.setKnowledgeCollections(knowledgeEntityList);

        return customerKnowledge;
    });
Long dataConstructionTime = System.currentTimeMillis() / 1000;
Dataset<Row> dataset = sparkSession
    .createDataFrame(customerKnowledgeList, CustomerKnowledgeEntity.class)
    .repartition(16)
    .cache();

The dsFinalSegRfm.show(20, false); returns what I expect :

enter image description here

But I'm getting a Null Pointer Exception from createDataFrame method.

I'm learning Spark but I find it very opaque for debugging... Any help is appreciated !



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source