'Mulesoft dataweave streaming behaviour in case of objects

I've been trying to understand dataweave streaming in case collections and object. In case of collections it works as expected for example in the below paylod

[
    {"row0" : "0"},
    {"row1" : "1"},
    {"row2" : "2"},
    {"row3" : "3"},
    {"row4" : "4"}
]  

If I try the following script

%dw 2.0
input payload application/json
output application/json deferred=true
---
[payload[2] , payload[1]]

I get the following output

[
    {
        "row2": "2"
    },
    {
        "row4": "4"
    }
]

It is evident from above example the payload[1] returns { "row4": "4" } because after executing payload[2] the 2nd element refers to the 4th element of the actual payload.

But the same behaviour is not seen in case of Json Objects, here is the example

input payload

{
  "row0" : "0",
  "row1" : "1",
  "row2" : "2",
  "row3" : "3",
  "row4" : "4"
}

dataweave script (same as the one used in case of collection)

%dw 2.0
input payload application/json
output application/json deferred=true
---
[payload[2] , payload[1]]

This return me the ouput as

[
    "2",
    "1"
]

But from the previous behaviour in case of collection, shouldn't it return the following output?

[
    "2",
    "4"
]

because the 2nd index element in original object is already consumed and the next element with index 1 that now remains is the 4th index element of the original object

Here is the sample flow


<flow name="streamingFlowObjects" doc:id="c5f23756-5083-4a7d-a173-edad1ad69c75" >
        <http:listener doc:name="Listener" doc:id="4bfde9f0-27ee-4e77-8c7a-4be0ffe8f488" config-ref="HTTP_Listener_config" path="/sync" outputMimeType="application/json; streaming=true">
        </http:listener>
        <logger level="INFO" doc:name="Logger" doc:id="c1bb4dde-c965-4d92-bb06-5e4fd2ae2166" message="#[output application/json --- 'Started ' ++ now()]"/>
        <ee:transform doc:name="Transform Message" doc:id="1fe3dda6-9aa7-4948-9c77-d8ec033ec029" >
            <ee:message >
                <ee:set-payload ><![CDATA[%dw 2.0
input payload application/json
output application/json deferred=true
---
[payload[1] , payload[0]]]]></ee:set-payload>
            </ee:message>
        </ee:transform>
        <logger level="INFO" doc:name="Logger" doc:id="31033cc4-1827-4960-9c64-588bda1989cf" message="#[output application/json --- 'Completed ' ++ now()]"/>
    </flow>

Can someone explain why is such difference observed in objects and array, or my understanding is incorrect.



Solution 1:[1]

First let's remember some definitions of streaming in DataWeave:

  • The basic unit of the stream is specific to the data format. The unit is a record in a CSV document, an element of an array in a JSON document, or a collection in an XML document.
  • Streaming accesses each unit of the stream sequentially. Streaming does not support random access to a document.

Then let's note that while you used the index selector in both cases it doesn't mean that data is converted to a stream able collection. It only means that some kind of indexed access is performed.

It is clear that the first case is a payload that is an array, accessed as a stream. Then you can see the side effect of consuming the stream. It is interesting and useful to know that this can happen.

The second case is a single object, and indexed access is only returning elements from the same object, which is the unit of work and thus not streamable. The index selector is behaving as "random access to a document". That's the reason for the different behavior.

Solution 2:[2]

  1. You can try setting @streamcapable. This will give an idea that it is stream capable or not.
  2. Your second case is an object, the stream will work only for each element of an array. key-value pair of an object is not streamable.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 aled
Solution 2 Suraj Rao