'Using multiple wildcards in Pentaho spoon for subdirectory paths
I was trying to use the GetFiles step to retrieve all files that are type .xlsx having sheet in the filename and having common names in the subdirectories path.
example directory contents:
c:\DATA\a1 info\a1 z information\a1 box\a1 b2 NEW\a1 sheet.xlsx
c:\DATA\a1 info\a2 zx information\a2 box\a2 b2 NEW\a2 sheet.xlsx
c:\DATA\a1 info\a3 zy information\a3 box\a3 b2 NEW\a3 sheet.xlsx
c:\DATA\a1 task\a1 z task\a1 box\a1 b2 new\sheet.xlsx
c:\DATA\a1 task\a1 z task\a1 box\a1 b2 new\sheet.xlsx
I only want the filenames of the files with the following constraints:
Home directory is c:\DATA
The first subdirectory having info in the name.
The second subdirectory having information in the name
The third subdirectory having box in the name
The fourth subdirectory having NEW
I have tried
File/Directory Wildcard (RegExp) Exclude wildcard Required Include subfolders
C:\DATA\ .*.info\.*.information\.*.box\.*.NEW\.*.sheet.*.xlsx N Y
C:\DATA\ .+info\.*.information\.*.box\.*.NEW\.*.sheet.*.xlsx N Y
C:\DATA\ .*info\.*information\.*box\.*NEW\.*sheet.*.xlsx N Y
I am at a loss. Thanks in advance.
Solution 1:[1]
Using data from previous step, you should send input data with those parameters, in each row set the directory and exten:
EXAMPLE
CREATE TABLE test(
diretory TEXT,
exten CHARACTER VARYING(15)
)
INSERT INTO testdir
(diretory, exten)
VALUES ('C:\Users\...\Documents\revision\','.*.(xlsx|XLSZ)'),
('C:\Users\...\Downloads\...\','.*.(xls|XLS)'),
('D:\...\Origen\ETA\','.*.(txt|TXT)');
And transformation like this:
Configuration step:
And results:
I think that works for you
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Diego De Vita |
