'Mimetype check using Tika jars

I am developing standard alone Java batch process. I am trying to determine file attachment mimetype using Tika Jars. I am using Tika 1.4 Jar files.

My code look like

Parser parser= new AutoDetectParser();
InputStream stream = new FileInputStream(fileAttachment);
int writerHandler =-1;
ContentHandler contentHandler= new BodyContentHandler(writerHandler);
Metadata metadata= new Metadata();
parser.parse(stream, contentHandler, metadata, new ParseContext());
String mimeType = metadata.get(Metadata.CONTENT_TYPE);
logger.debug("File Attachment: "+fileattachment.getName()+" MimeType is: "+mimeType);

This code is not working properly for the office 03 and 07 documents.

While running from eclipse I am getting correct mimetypes.

I build jar file and running from command its giving wrong mimetypes.

out put from command
------------
File Attachment: Testpdf.pdf  MimeType is: application/pdf
File Attachment: Testpdf.tif  MimeType is: image/tiff
File Attachment: Testpdf.xlsx  MimeType is: application/x-tika-ooxml
File Attachment: Testpdf.xltx  MimeType is: application/x-tika-ooxml
File Attachment: Testpdf.pptx  MimeType is: application/x-tika-ooxml
File Attachment: Testpdf.docx  MimeType is: application/x-tika-ooxml
File Attachment: Testpdf.xls  MimeType is: application/zip
File Attachment: Testpdf.doc  MimeType is: application/x-tika-msoffice
File Attachment: Testpdf.dot  MimeType is: application/x-tika-msoffice
File Attachment: Testpdf.ppt  MimeType is: application/x-tika-msoffice
File Attachment: Testpdf.xlt  MimeType is: application/vnd.ms-excel

I tried with OfficePraser, OOXMLParser. Its not working. I have tried with tika 0.9 jar files. mimeTypes are coming correctly but if any one of my file attachment is "editable pdf" my batch process is dying (like "exit(0);" in code). If I have new tika jars its giving wrong mimeTypes.

Please help me in this. Thanks in advance.

CVSR Sarma



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source