'perl reading file to array with varying row lengths, "use of uninitialized value"
First post so please forgive if I formatted poorly. This is more of an annoying output problem that prevents easy scanning for true errors than anything. In short I have to break every row of a large file into individual characters, but the length of the row isn't the same so towards the end I get a blast of "use of uninitialized values". The script works fine but again it's hard to see the actual output I need to see what is going on and which lines it chooses not to use. Details and relevant script below.
use warnings;
use strict;
Maybe if I didn't use these my problem would go away but I'd like to keep it together.
I have a script that is made to manipulate a pdb file which need exact column number of values. My script is in perl and takes in 10-50K files. It first breaks every line into individual characters (stored in array @line) and stores a specific number of characters into an array called @column. It then checks the first string of 6 characters and throws out any that don't match 1 of three specific strings. Also it changes column 22 into the letter "A". Finally storing everything into another file. The lines range from 3 characters to 80 so for each array element that is NULL from the file being blank throws this error when not all 80 characters are present. I saw a post with similar problem but they were doing a csv file which I can't use as explained. I can't just detect for spaces wither because as you can see in the file example below fields bleed into each other so it has to be column specific.
read-in section:
while (my $row = <FH>) {
chomp $row;
$row =~ s/^\s+//;
@line = split(//, $row);
$column[0] = join ('', @line[0..5]);
$column[1] = join ('', @line[6..10]);
$column[2] = join ('', @line[11..15]);
$column[3] = join ('', $line[16]);
$column[4] = join ('', @line[17..19]);
$column[5] = join ('', $line[20]);
$column[6] = join ('', $line[21]); # Chain ID
$column[7] = join ('', @line[22..25]); #residue number
$column[8] = join ('', $line[26]);
$column[9] = join ('', @line[27..37]);
$column[10] = join ('', @line[38..45]);
$column[11] = join ('', @line[46..53]);
$column[12] = join ('', @line[54..59]);
$column[13] = join ('', @line[60..65]);
$column[14] = join ('', @line[66..75]);
$column[15] = join ('', @line[76..77]);
this error is present for about 60+ lines for short rows:
no match
Use of uninitialized value in join or string at ./change_chain_ID_to_A.pl line 35, <FH> line 33828.
Use of uninitialized value in join or string at ./change_chain_ID_to_A.pl line 35, <FH> line 33828.
Use of uninitialized value in join or string at ./change_chain_ID_to_A.pl line 35, <FH> line 33828.
Use of uninitialized value in join or string at ./change_chain_ID_to_A.pl line 36, <FH> line 33828.
Use of uninitialized value in join or string at ./change_chain_ID_to_A.pl line 36, <FH> line 33828.
Use of uninitialized value in join or string at ./change_chain_ID_to_A.pl line 36, <FH> line 33828.
etc...etc
example lines from input file
HETATM33701 CA CA I2111 20.810 32.443 -53.618 1.00 0.00 Ca
HETATM33702 CA CA I2112 -7.146 39.054 -51.559 1.00 0.00 Ca
CONECT 3502 3501 4093
CONECT 4093 3502 4092
CONECT119241192312515
CONECT125151192412514
CONECT203462034520937
Solution 1:[1]
Following code is provided for educational purpose only with an accent on parsing fixed length structured data records.
OP did not provide enough information to make a suggestion to direct OP in right direction.
The correct approach is to use CPAN genetics module which was designed specifically for such job/purpose.
Demo code demonstrates usage of unpack function to extract data structure.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
use YAML;
my($data,$model,$index);
while( <DATA> ) {
chomp;
next if /^ENDMDL|^\s+\z/;
my $item;
$index = 0 if /^MODEL/;
$model = parse_model($_) if /^MODEL/;
$item = parse_atom($_) if /^ATOM/;
$item = parse_hetatm($_) if /^HETATM/;
$item = parse_ter($_) if /^TER/;
$data->{$model->{serial}}[$index++] = $item if defined $item;
}
#say Dumper($data);
say Dump($data);
sub parse_model {
my $line = shift;
my $model;
my @fields = qw/record_name serial/;
$model->@{@fields} = unpack('a6x4a4');
defined($model->{$_}) && $model->{$_} =~ s/^\s+|\s+\z//g for @fields;
return $model;
}
sub parse_atom {
my $line = shift;
my $atom;
my @fields = qw/record_name serial name altLoc resName chainID resSeq iCode x y z occupancy tempFactor element charge/;
$atom->@{@fields} = unpack('a6a5xa4aa4aa8a8a8a8a6a6x11a2a2',$line);
defined($atom->{$_}) && $atom->{$_} =~ s/^\s+|\s+\z//g for @fields;
return $atom;
}
sub parse_hetatm {
my $line = shift;
my $hetatm;
my @fields = qw/record_name serial name altLoc resName chainID resSeq iCode x y z occupancy tempFactor element charge/;
$hetatm->@{@fields} = unpack('a6a5xa4aa4aa8a8a8a8a6a6x11a2a2',$line);
defined($hetatm->{$_}) && $hetatm->{$_} =~ s/^\s+|\s+\z//g for @fields;
return $hetatm;
}
sub parse_ter {
my $line = shift;
my $ter;
my @fields = qw/record_name serial resName chainID resSeq iCode/;
$ter->@{@fields} = unpack('a6a5x6a3aa4a',$line);
defined($ter->{$_}) && $ter->{$_} =~ s/^\s+|\s+\z//g for @fields;
return $ter;
}
__DATA__
1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
MODEL 1
ATOM 1 N ALA A 1 11.104 6.134 -6.504 1.00 0.00 N
ATOM 2 CA ALA A 1 11.639 6.071 -5.147 1.00 0.00 C
HETATM 3835 FE HEM A 1 17.140 3.115 15.066 1.00 14.14 FE
HETATM 8238 S SO4 A2001 10.885 -15.746 -14.404 1.00 47.84 S
HETATM 8239 O1 SO4 A2001 11.191 -14.833 -15.531 1.00 50.12 O
HETATM 8240 O2 SO4 A2001 9.576 -16.338 -14.706 1.00 48.55 O
HETATM 8241 O3 SO4 A2001 11.995 -16.703 -14.431 1.00 49.88 O
HETATM 8242 O4 SO4 A2001 10.932 -15.073 -13.100 1.00 49.91 O
ATOM 293 1HG GLU A 18 -14.861 -4.847 0.361 1.00 0.00 H
ATOM 294 2HG GLU A 18 -13.518 -3.769 0.084 1.00 0.00 H
TER 295 GLU A 18
ENDMDL
MODEL 2
ATOM 296 N ALA A 1 10.883 6.779 -6.464 1.00 0.00 N
ATOM 297 CA ALA A 1 11.451 6.531 -5.142 1.00 0.00 C
HETATM 3835 FE HEM A 1 17.140 3.115 15.066 1.00 14.14 FE
HETATM 8238 S SO4 A2001 10.885 -15.746 -14.404 1.00 47.84 S
HETATM 8239 O1 SO4 A2001 11.191 -14.833 -15.531 1.00 50.12 O
HETATM 8240 O2 SO4 A2001 9.576 -16.338 -14.706 1.00 48.55 O
HETATM 8241 O3 SO4 A2001 11.995 -16.703 -14.431 1.00 49.88 O
HETATM 8242 O4 SO4 A2001 10.932 -15.073 -13.100 1.00 49.91 O
ATOM 588 1HG GLU A 18 -13.363 -4.163 -2.372 1.00 0.00 H
ATOM 589 2HG GLU A 18 -12.634 -3.023 -3.475 1.00 0.00 H
TER 590 GLU A 18
ENDMDL
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
