'Weka machine learning - Interpeting naive bayes

I got a training dataset of ill horses, the data it contains is about surgeries and diseases. Some of the fields of the registers are like: temperature of the horse, age, pulse, respiratory rate etc ....

What I want to do a clasificator on the live/dead/euthanized column of every row. What I am asked to check is:

  • Think about hypothesis of independence of variables
  • Check if I got enought number of elements to obtain reliable probabilities

The dataset had like 25% of missing values and them where imputated using MIMMI imputation.

Thinking about the possibility of getting reliable probabilities, I can see that the training dataset is a little unbalanced: 179 horses live and 121 die (dead + euthanized). But im not really sure of that. Any help with that two questions would be so much helpful for me.

=== Run information ===

Scheme:weka.classifiers.bayes.NaiveBayes 
Relation:     horseColic-weka.filters.unsupervised.attribute.Remove-R25-27
Instances:    300
Attributes:   24
              surgery
              age
              id
              temp
              pulse
              respRate
              tempExtrem
              periPulse
              mucMemb
              capRefT
              pain
              peri
              abdDist
              ngTube
              ngReflux
              ngRPH
              feces
              abd
              pCellVol
              totProt
              abdCentApp
              abdCentTotProt
              outc
              surgLes
Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

Naive Bayes Classifier

                                  Class
Attribute                         lived         died   euthanized
                                 (0.59)       (0.26)       (0.15)
==================================================================
surgery
  yes                               97.0         59.0         28.0
  no                                84.0         20.0         18.0
  [total]                          181.0         79.0         46.0

age
  adult                            168.0         67.0         44.0
  young                             13.0         12.0          2.0
  [total]                          181.0         79.0         46.0

id
  mean                      1009274.0202 1452556.3598  751596.8611
  std. dev.                 1431022.1677 1887025.7703  989556.6807
  weight sum                         179           77           44
  precision                    16915.735    16915.735    16915.735

temp
  mean                           34.8733      35.0055       33.054
  std. dev.                      10.2335      13.0545      14.9588
  weight sum                         179           77           44
  precision                       0.9275       0.9275       0.9275

pulse
  mean                           29.2039      33.2115      29.0187
  std. dev.                      10.8578      14.6404      16.7248
  weight sum                         179           77           44
  precision                       0.9107       0.9107       0.9107

respRate
  mean                           15.0771      16.9169      15.9348
  std. dev.                       8.9803       7.0278       8.1221
  weight sum                         179           77           44
  precision                       0.8667       0.8667       0.8667

tempExtrem
  normal                            82.0         16.0         12.0
  warm                              36.0          7.0          3.0
  cool                              53.0         48.0         25.0
  cold                              12.0         10.0          8.0
  [total]                          183.0         81.0         48.0

periPulse
  normal                           133.0         22.0         11.0
  increased                          5.0          8.0          7.0
  reduced                           43.0         47.0         25.0
  absent                             2.0          4.0          5.0
  [total]                          183.0         81.0         48.0

mucMemb
  normal-pink                       95.0          9.0          7.0
  bright-pink                       23.0         13.0          6.0
  pale-pink                         37.0         19.0         12.0
  pale-cyanotic                     16.0         17.0         12.0
  bright-red                         7.0         14.0          8.0
  dark-cyanotic                      7.0         11.0          5.0
  [total]                          185.0         83.0         50.0

capRefT
  short                            153.0         46.0         23.0
  long                              28.0         33.0         23.0
  long2                              1.0          1.0          1.0
  [total]                          182.0         80.0         47.0

pain
  no-pain                           53.0          6.0          8.0
  depressed                         42.0         21.0         14.0
  inte-mild-pain                    64.0         10.0          8.0
  inte-severe-pain                  12.0         18.0         12.0
  cont-severe-pain                  13.0         27.0          7.0
  [total]                          184.0         82.0         49.0

peri
  hypermotile                       42.0          7.0          7.0
  normal                            22.0          8.0          5.0
  hypomotile                        90.0         37.0         17.0
  absent                            29.0         29.0         19.0
  [total]                          183.0         81.0         48.0

abdDist
  none                              88.0         17.0         13.0
  slight                            53.0         18.0          8.0
  moderate                          28.0         30.0         14.0
  severe                            14.0         16.0         13.0
  [total]                          183.0         81.0         48.0

ngTube
  none                              79.0         40.0         27.0
  slight                            90.0         32.0         15.0
  significant                       13.0          8.0          5.0
  [total]                          182.0         80.0         47.0

ngReflux
  none                             149.0         50.0         30.0
  much                              17.0         15.0          6.0
  less                              16.0         15.0         11.0
  [total]                          182.0         80.0         47.0

ngRPH
  mean                           11.3797      13.0882       8.0606
  std. dev.                       2.3535       3.2916       5.1673
  weight sum                         179           77           44
  precision                       0.7917       0.7917       0.7917

feces
  normal                            77.0         14.0         10.0
  increased                         16.0         14.0          8.0
  decreased                         44.0         15.0         11.0
  absent                            46.0         38.0         19.0
  [total]                          183.0         81.0         48.0

abd
  normal                            48.0         13.0          4.0
  other                             39.0          5.0          7.0
  firm-large-intestine              18.0          8.0          6.0
  dist-small-intest                 32.0         24.0          8.0
  distended-large-intest            47.0         32.0         24.0
  [total]                          184.0         82.0         49.0

pCellVol
  mean                           31.0162      47.0465      46.0112
  std. dev.                      14.1207      18.5468       17.672
  weight sum                         179           77           44
  precision                       0.9518       0.9518       0.9518

totProt
  mean                           42.6539       41.451      43.7936
  std. dev.                      16.9138      18.6362      19.3247
  weight sum                         179           77           44
  precision                       0.9432       0.9432       0.9432

abdCentApp
  clear                            112.0         25.0         10.0
  cloudy                            54.0         22.0         20.0
  serosanguinous                    16.0         33.0         17.0
  [total]                          182.0         80.0         47.0

abdCentTotProt
  mean                           16.1341      21.1634      14.3203
  std. dev.                       6.8038       4.9109       8.6619
  weight sum                         179           77           44
  precision                       0.8837       0.8837       0.8837

surgLes
  yes                               94.0         70.0         30.0
  no                                87.0          9.0         16.0
  [total]                          181.0         79.0         46.0



Time taken to build model: 0.01 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances         216               72      %
Incorrectly Classified Instances        84               28      %
Kappa statistic                          0.5134
Mean absolute error                      0.1965
Root mean squared error                  0.3803
Relative absolute error                 52.8451 %
Root relative squared error             88.2672 %
Total Number of Instances              300     

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.777     0.198      0.853     0.777     0.813      0.873    lived
                 0.675     0.175      0.571     0.675     0.619      0.871    died
                 0.568     0.082      0.543     0.568     0.556      0.824    euthanized
Weighted Avg.    0.72      0.175      0.735     0.72      0.725      0.865

=== Confusion Matrix ===

   a   b   c   <-- classified as
 139  28  12 |   a = lived
  16  52   9 |   b = died
   8  11  25 |   c = euthanized


Solution 1:[1]

Naive Bayes has the prominent assumption that all attributes are independent. Meaning that in this case age, surgery, temp are taken to be mutually independent. This may not be the case though, and in many instances is not. Naive Bayes however will generally obtain decent results with little training, but is normally not as good as a model in which the assumptions are more correct. Finding these models takes time and effort though, and often a Naive Bayes model will reach an adequate accuracy. Not sure about your sample size, you'll have to look at the statistical power of your dataset.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dronious