No free lunch theorem.
Curses of machine learning:
1. Over fitting.
2. High dimensionality – Often you need to learn from data of very high dimensions like number of pixels in an image, number of clicks or tastes to identify your preference, words on pages etc. Most of these dimensions or attributes are irrelevant to what you are currently trying to learn. Considering these irrelevant attributes in our learning may result in bad prediction. We don’t have an automated way of finding relevancy for that particular learning task. This is particularly the case in nearest neighbor.
As the number of features/dimensions grows, the amount of Data we need to generalize accurately grows exponentially.
Rationalism vs empiricism
How can we ever be justified from generalizing from what we have seen to what we haven’t .
Credit assignment problem
Back propagation algorithm. Doesn’t know global optimum.
Gradient ascent and descent.
Autoencoder: multi layer perceptron whose output is same as that of input. To make hidden layer much smaller than the input and output layer.
Stacked Sparse auto encoders to learn high level concepts like face from low level concepts like edges and shades, hierarchically.
Human intelligence boils down to a single algorithm-andrew ng
Convolutional neural networks are modeled by the inspiration of visual cortex.
Optimal learning is the bayesian’s central goal.
Laplace’s Rule of Succession:
The probability that an event will occur after it has occurred n times successively
A controversy in the definition of probability:
- Prior probability is a subjective degree of belief.
- Probability is the frequency with which a subset event occurs in the sample space.
Bayesian learning is computationally costly.
A learner that assumes different effects are independent given the cause is called the Naive-Bayes classifier.
Page-rank uses the idea of markov chain. Web pages with many incoming links are probably more important than views.
Hidden Markov models are used for inference in speech recognition,
Continuous version of HMM is Kalman filter.
Markov chain Monte Carlo to converge the distributions of Bayesian network.
MCMC is a random walk on Markov chains and in long run, number of times each node is visited is proportional to its probability.
Applying probability to medical diagnosis.
Analogizers can work with less data.
They don’t form a model.
Analogy is behind many scientific advances.
If two things are similar , the thought of one will tend to trigger the thought of other – Aristotle
Algorithms in analogy domain are
1. K-Nearest neighbor (weighted)
Doesn’t work well with lots of dimensions, hyperspace.
Can’t Identifying relevant attributes
Discovering blanket space in hyper space.
2. Support vector machines
Weights have a single optimum instead of many local ones – advantage over multi-layer perceptron.
Only one layer.
Extending to new dimension.
3. Full blown analogical reasoning.
Learning cross problem domains.
An algorithm that could spontaneously group together similar objects or different images of the same object – clustering problem.
K means, naive Bayes, em algorithms.
ISOmap – for nonlinearity reduction.
Reinforcement learning – optimal control in an unknown environment.
Deep mind at the intersection of reinforcement and multilayer perception.
Meta learning: combining all learning algorithms.
Stacking – a metalearner
Bagging – divide the training set into multiple samples by random sampling and apply the learning algorithm to each set. This would decrease the variance and increases accuracy.
Boosting – a meta learner.
Master algorithm resides in the circles of optimization town, towers of representation.evaluation
Optimization techniques in different tribes:
Inverse deduction – symbolists
Gradient descent – connectionist
Genetic search ( cross over and mutation) – evolutionaries
Constrained optimization – analogizers
What are to be combined?
Decision trees, multi layer perceptions, classifier systems, naive Bayes,svms
The most important thing in an equation is all the quantities that don’t appear in it.
The universe maximises entropy subject to keeping energy constant.
PCA is to unsupervised learning what linear regression is to supervised learning.
Five most important personality traits to look for, extroversion,agreeableness,conscientiousness,neuroticism and openness to experience.
The law of effect
Children explore and adults exploit.
Snippets of reinforcement learning also known as habits make up most of what you do.
You don’t try to outrun a horse , you ride it. It’s not computer vs humans , its humans with computers vs without computers.
Whatever is true of everything we have seen is true of everything in the universe – Newton
Time is the principal component of memory.