'Is employing BPNN for water quality management an overkill? [closed]

I'm developing a device for Freshwater Quality Management which can be used for freshwater bodies such as lakes and rivers. The project is spread in three parts:

  1. The first part deals with acquiring parameters such as pH, turbidity etc.
  2. The second part deals with taking corrective measures based on the parameters. For instance, if the pH is too low, the device will inject basic solution to maintain a pH of 7-7.5.
  3. Now the third part deals with predicting the health of the lake based on the parameters acquired (pH/Turbidity etc.). The predictive algorithm shall take in account of parameters and develop a correlation between them to explain for how long the lake will sustain. To achieve this, I'm currently biased toward using Back Propagation Neural Network (BPNN) as I have found that multiple other people/institutes prefer NN for water quality management.*

Now my concern is whether using BPNN would be an overkill for this project? If yes, which method/tool should I go for?

*1,2 and 3



Solution 1:[1]

Doing something the way "it used to be" is not always the best idea. In general, if you do not have strong, analytical reasons to choose neural network you should not ever start with it. Neural networks are tricky to train, have huge number of hyperparameters, are non-deterministic and computationaly expensive. Always start with the simpliest model, and only if it yields poor results - move to more complex ones. From theoretical perspecitive it is strongly justified by Vapnik theorems, and from practical it is similar to agile approach in programming.

So where to start?

  1. Linear regression (Ridge regression, Lasso)
  2. Polynomial regression
  3. KNN regression
  4. RBF Networks
  5. Random Forest Regresor

If all of them fail - think about "classical" neural network. But chances are rather... small.

Solution 2:[2]

A neural network is a function approximator. If what you have is a real-valued vector of inputs, and associated with each of those vectors, you have a target real number or classification such as "good", "bad", "red", etc. then a neural network can be used to solve your problem. Neural networks are, in their simplest form, functions of the form n(x) := g(Wh(Ax + b)+ c), where A and W are matrices, and b and c are vectors, h is a component-wise nonlinear function, generally a sigmoid function, and g is a function taking the same values as your target space. In your case, your input vector, denoted x above, would contain pH, turbidity, etc, and your targets would be how long the lake will sustain. If your network is "trained" properly, it will be able to, given an unseen input u (new measurements for pH and turbidity etc), compute a good approximation to how long the lake will sustain. "Training" a neural network consists of choosing the parameters for A, W, b, c. How many of these parameters there are depends on how many columns you chose for A and W (and therefore also for b and c). One way to choose these parameters is such that the function n(x) is close to your actual, measured targets on all of the historical (training) examples you have. More specifically, A,W,b,c are chosen to minimize E(A,W,b,c) := (n(x) - t(x))^2 where t(x) is your historically measured target (how long the lake sustained when the pH and turbidity were as measured in x). One way to try to minimize E over A,W,b,c is to compute the gradient of E with respect to each of the parameters and then take a step toward the negative of the gradient via an algorithm called back-propagation.

I want to note that the computation of a neural network, when the parameters are fixed, is deterministic, but that there are some algorithms for computing the gradient of E which aren't deterministic. Some other algorithms are deterministic.

So, with all that as background, are neural networks overkill for your project? That depends on the function you're trying to approximate from your observations to the output you're trying to predict. Whether a neural network will give you good prediction accuracy depends on many factors, perhaps the most important of which is how many examples you have to train on. If you don't have very many training examples relative to the number of predictors, a neural network may not be what you're looking for, but for the most part, that's an empirical question more than a theoretical one.

The nice thing is that, if you're willing to use python, there are good libraries to make all of this testing very easy for you. If you try a nerual network and it doesn't give you very good predictions, there are many other methods of regression you could try. You could try linear regression (which is a special case of a neural network), or a random forest for example. All of these are easy to code up in python if you use sklearn for your linear regression and your random forest. There are a few libraries for neural networks which make playing with them pretty easy, as well. I recommend tensorflow for neural networks.

My recommendation would be to spend a little bit of time trying several methods. For a relatively simple prediction problem like this, the time to train your network should be pretty short. The longer times of days or weeks you may have heard about are for massive datasets with millions or billions of training examples and millions of parameters.

Here http://pastebin.com/KrUAX9je is a toy neural network I created to "learn" to approximate a function f(a,b,c) = abc.

Solution 3:[3]

Backpropagation (BP) is a method for learning artificial neural network model parameters using gradient descent. It computes the gradients in an efficient manner. There are also other methods to train such models but BP is more commonly used due to many reasons. I do not know anything about the scale of the projects and the amount of data collected, but neural networks are more effective if the number of examples is large. If you have, say, 10 attributes (pH, Turbidity ...) and maybe more than 2-3k examples then neural networks could be helpful.

However, you should not think neural networks are the BEST model ever. You need to try out different models and choose the one giving you the best performance.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 lejlot
Solution 2 Ryan Stout
Solution 3