'Parsing table-like string into JavaScript object

This string is structured in a human-readable table-like way. It contains three columns. However, the only information I need is a list of all of the values from the first column.

app115                                115.115                              winget
app225                                115.115Chrome                        winget
Knotes                                1MHz.Knotes                          winget
BPMN-RPA Studio                       1ic.BPMN-RPAstudio                   winget
Fishing Funds                         1zilc.FishingFunds                   winget
3601                                  360.360Chrome                        winget
3602                                  360.360Chrome.X                      winget
3603                                  360.360CleanMaster                   winget
3604                                  360.360se                            winget
3CX Call Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget

Using javascript, how would I parse this string to get a result of something like:

['app115', 'app225', 'Knotes', 'BPMN-RPA Studio', 'Fishing Funds', '360', '360', '360', '360', '3CX Call Flow Designer (.exe edition)']

Here are some of my ideas that I couldn't get to work:

step 1, since the second two columns are not necessary, we can start by replacing 'winget' with blank text string1.replaceAll("winget", "") this removes the whole left column because all values in that column are 'winget'.

step 2, remove all occurrences of multiple characters that are surrounded by 2 or more spaces on each side. This should get rid of the whole second column because each value has at least two spaces on each side. -- will not work because if the value in the first column is too long, the value in the second column may only have one space next to it. Check the last row of the original string.

last step, once string now looks something like: "app115 app225 Knotes BPMN-RPA Studio Fishing Funds...", make into an array using string.split(" ")

Hope my question makes sense. Thanks for any help



Solution 1:[1]

This regex will extract the first column that is of fixed length ( 38 here ).
It's a template, can be adapted to work to get any column though.
It also trims leading and trailing whitespace. (?<=^\s*(?!\s)).{1,38}(?<!\s)(?<=^.{1,38})|^(?=\s{38})
This is a single operation and is a Template that is valid only when using variable
length look behind construct engines like JS and C#.

The regex is no more complex than putting together a password regex.

  (?<=               # Alignment using a look behind assertion
    ^ \s*              # Beginning of line, optional ws
    (?! \s )           # Not a ws forward
  )
  .{1,38}            # 1-38 characters width column
  (?<! \s )          # Look behind assertion for trailing ws trim
  (?<= ^ .{1,38} )   # Look behind assertion to fix overall length to 38
| 
  ^                  # Or the entire column is WS
  (?= \s{38} )       # Check with look ahead asserstion

column1 = table.match( /(?<=^\s*(?!\s)).{1,38}(?<!\s)(?<=^.{1,38})|^(?=\s{38})/gm )
console.log(column1)
<script>
const table = `app115                                115.115                              winget
app225                                115.115Chrome                        winget
   Knotes                             1MHz.Knotes                          winget
BPMN-RPA Studio                       1ic.BPMN-RPAstudio                   winget
Fishing Funds                         1zilc.FishingFunds                   winget
3601                                  360.360Chrome                        winget
                                      1zilc.FishingFunds                   winget
3602                                  360.360Chrome.X                      winget
3603                                  360.360CleanMaster                   winget
3604                                  360.360se                            winget
3CX Call Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget
    Call Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget
         Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget`
</script>

To generalize the above to get any column, just the column offset (in characters)
and the column width are needed. These can be plugged into this regex template:
(?:(?<=^.{N}\s*(?!\s)).{1,W}(?<!\s)(?<=^.{N}.{1,W})|(?<=^.{N})(?=\s{W}))
Where N is the Offset up to the column. W is the Width of the column.
In the linked example, N = 10, W = 38.

(?:
  (?<=               # Alignment using a look behind assertion
    ^ .{N} \s*         # Offset to column and leading ws trim
    (?! \s )           # Not a ws forward
  )
  .{1,W}             # 1 - width column characters
  (?<! \s )          # Not a ws behind for trailing ws trim
  (?<=               # Behind col offset and 1 - width, to fix overall length
    ^ .{N} .{1,W}   
  )
|                   # Or the entire column is WS
  (?<= ^ .{N} )      # Alignment behind offset to column
  (?= \s{W} )        # Ahead insure entire column is ws
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1