'Apache Tika correctly detect mime type in the new version
I'm trying to detect MIME types of the files by their byte content. I'm using Apache Tika. The problem was that when detecting .docx or .xlsx files, Tika was giving strange types like application/x-tika-ooxml and I didn't find anywhere the complete list of these special Tika types so I can map them to existing MIME types.
But following this question helped me, and I was able to get the correct MIME types like application/msword instead of Tika ones. But in order to do that I needed to downgrade the version from latest to one of the older ones (1.28.1 instead of 2.3.0), but it's probably not the best approach, I suppose.
So my question is: what is the correct way for the newest version of how to detect MIME types? I think you should be able to do that in the newest versions too. I simply used the code from the question:
TikaConfig config = TikaConfig.getDefaultConfig();
Detector detector = config.getDetector();
TikaInputStream stream = TikaInputStream.get(fileOrStream);
Metadata metadata = new Metadata();
metadata.add(Metadata.RESOURCE_NAME_KEY, filenameWithExtension);
MediaType mediaType = detector.detect(stream, metadata);
But the problem with the new version is that there's no Metadata.RESOURCE_NAME_KEY anymore. I also added the same version of tika-parsers dependency, is that necessary in the new version?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
