'How to include custom attributes in spaCy's Doc.from_docs function?

There is a method Doc.from_docs() in spaCy3 (here are links to the code and the high-level documentation) which would come in handy for a project I'm working on. This method concatenates a list of Doc objects into a single Doc object.

On a high level, here's what the method does:

  1. convert all input Doc objects to numpy arrays via Doc.to_array()
  2. concatenate the resulting arrays
  3. create single Doc object from the numpy array resulting in step (2) via Doc.from_array()

I would like to find out how to make this from_docs method take into account also custom attributes. Currently, native spaCy attributes like e.g. "POS" or "DEP" are considered and any related tags are transferred from the original input Doc objects to the resulting concatenated Doc object. However, any custom attribute extensions (i.e. Doc._.*) are lost when executing this method.

Does anyone know how to include custom attributes in the Doc.from_docs() method?

Thank you for any hints.



Solution 1:[1]

Doc.from_docs only includes Token and Span extensions.

(What value should the final merged doc include for doc1._.ext = True + doc2._.ext = False?)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 aab