'Concatenate values from non-adjacent objects based on multiple matching criteria

I received help on a related question previously on this forum and am wondering if there is a similarly straightforward way to resolve a more complex issue.

Given the following snippet, is there a way to merge the partial sentence (the one which does not end with a "[punctuation mark][white space]" pattern) with its remainder based on the matching TextSize? When I tried to adjust the answer from the related question I quickly ran into issues, but I am basically looking to translate a rule such as if .Text !endswith("[punctuation mark][white space]") then .Text + next .Text where .TextSize matches

  {
    "Text": "Was it political will that established social democratic policies in the 1930s and ",
    "Path": "P",
    "TextSize": 9
  },
  {
    "Text": "31 Lawrence Mishel and Jessica Schieder, Economic Policy Institute website, May 24, 2016 at (https://www.epi.org/publication/as-union-membership-has-fallen-the-top-10-percent-have-been-getting-a-larger-share-of-income/). ",
    "Path": "Footnote",
    "TextSize": 8
  },
  {
    "Text": "Fig. 9.2 Higher union membership has been associated with a higher share of income to lower income brackets (the lower 90%) and a lower share of income to the top 10% of earners. ",
    "Path": "P",
    "TextSize": 8
  },
  {
    "Text": "1940s, or that undermined them after the 1970s? Or was it abundant and cheap energy resources that enabled social democratic policies to work until the 1970s, and energy constraints that forced a restructuring of policy after the 1970s? ",
    "Path": "P",
    "TextSize": 9
  },
  {
    "Text": "Recall that my economic modeling discussed in Chap. 6 shows that, even with no change in the assumption related to labor \u201cbargaining power,\u201d you can explain a shift from increasing to declining income equality (higher equality expressed as a higher wage share) by a corresponding shift from a period of rapidly increasing per capita resource consumption to one of constant per capita resource consumption. ",
    "Path": "P",
    "TextSize": 9
  }

The result I'm looking for would be as follows:

  {
    "Text": "Was it political will that established social democratic policies in the 1930s and 1940s, or that undermined them after the 1970s? Or was it abundant and cheap energy resources that enabled social democratic policies to work until the 1970s, and energy constraints that forced a restructuring of policy after the 1970s? ",
    "Path": "P",
    "TextSize": 9
  },
  {
    "Text": "31 Lawrence Mishel and Jessica Schieder, Economic Policy Institute website, May 24, 2016 at (https://www.epi.org/publication/as-union-membership-has-fallen-the-top-10-percent-have-been-getting-a-larger-share-of-income/). ",
    "Path": "Footnote",
    "TextSize": 8
  },
  {
    "Text": "Fig. 9.2 Higher union membership has been associated with a higher share of income to lower income brackets (the lower 90%) and a lower share of income to the top 10% of earners. ",
    "Path": "P",
    "TextSize": 8
  },
  {
    "Text": "Recall that my economic modeling discussed in Chap. 6 shows that, even with no change in the assumption related to labor \u201cbargaining power,\u201d you can explain a shift from increasing to declining income equality (higher equality expressed as a higher wage share) by a corresponding shift from a period of rapidly increasing per capita resource consumption to one of constant per capita resource consumption. ",
    "Path": "P",
    "TextSize": 9
  }

json jq

Solution 1:^[1]

The following, which assumes the input is a valid JSON array, will merge every .Text with at most one successor, but can easily be modified to merge multiple .Text values together as shown in Part 2 below.

Part 1

# input and output: an array of {Text, Path, TextSize} objects.
# Attempt to merge the .Text of the $i-th object with the .Text of a subsequent compatible object.
# If a merge is successful, the subsequent object is removed.
def attempt_to_merge_next($i):
  .[$i].TextSize as $class
  | first( (range($i+1; length) as $j | select(.[$j].TextSize == $class) | $j) // null) as $j
  | if $j then .[$i].Text += .[$j].Text | del(.[$j])
    else .
    end;

reduce range(0; length) as $i (.;
  if .[$i] == null then .
  elif .[$i].Text|test("[,.?:;]\\s*$")|not
  then attempt_to_merge_next($i)
  else .
  end)

Part 2

Using the above def:

def merge:
  def m($i):
    if $i >= length then .
      elif .[$i].Text|test("[,.?:;]\\s*$")|not
      then attempt_to_merge_next($i) as $x
      | if ($x|length) == length then m($i+1)
        else $x|m($i)
        end
      else m($i+1)
      end ;
  m(0);

merge

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Concatenate values from non-adjacent objects based on multiple matching criteria

Solution 1:[1]

Part 1

Part 2

Sources

Related Questions

Solution 1:^[1]