'Best and most efficient way to differentiate and assign types to timestamp tokens

I am working on timestamps parsing. The tokenization of the timestamps is being handled with python using the spacy library. Then I am sending over the tokens from python to Java. What I am trying to do now is to assign each token a 'time type' relating to what is it. A potential date assigned a type 'date', a year as 'year' etc. What I am trying to figure is an efficient way to check each token for what it could be. The challenge is to differentiate a date from a month in short format and a year as well. The basic way I can think of is using multiple checks, but I would need to make deductions to differentiate between dates and months (and year if it is in short format) depending on what the first token is assumed to be. Once I have all the token types and their position index, I can then use them to generate a pattern string using DateTimeFormatter class, so that the timestamp can be parsed and saved to DB. Is there another way to achieve this? I also need to be able to handle timestamps with any possible format like for eg. '20111203'. Appreciate for help!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source