Over the past few decades, Machine Learning (ML) has been dynamically becoming the state-of-art technology in various real time applications. ML steps into various fields mainly medical diagnosis with computational approaches, statistics, image processing, graphical games and so on. This post focuses on reviewing the concepts of machine learning algorithms that are significant in recent applications. Also, discuses about ML role in Big Data Analytics. Various applications are glimpsed with suitable examples. Major analysis has been done on medical diagnosis. Further, the limitations of ML are also listed.
Machine learning
Machine learning is
the subset or an application of the AI that will provide computers and machines
the ability to automatically learn and improve from experience without being
directly programmed or developed. Machine learning mainly gives importance on
the development of computer programs or modules that can access datasets and
use it for self learning. Over the past two decades ML has become one of the
mainstreams for information technology and Computer Science. In today's world
more than million data is generated per day With those ever increasing amounts
of data becoming available there is good reason to believe that Big data
analysis will become even more pervasive as a necessary ingredient for
technological thus improving several solutions by Machine learning and Deep
learning .
Arthur Samuel, an expert in the field of AI and computer gaming, coined the word “Machine Learning”. He gave the definition of machine learning as – “The Field of study that gives computers the capability to learn without being explicitly programmed”.
Basic Difference
in ML and Traditional Programming
- Traditional Programming: Feed DATA (Input) + PROGRAM (logic), run it on machine and get output.
- Machine Learning: Feed DATA(Input) + Output, run it on machine during training and the machine creates its own program(logic), which can be evaluated while testing.
ML programming
The machine
learning program or code learns from its experience ‘E’ when doing some set of tasks ‘T’ and performance measure ‘P’,
if its performance at those tasks in T, as measured by P,
then it improves with that experience E and it steadily continues to learn For example: playing a
checkers game. E is the experience of playing
from the past games. T is
the task for playing checkers. P is
the probability that the program will win in the next game.
This post mainly focuses on machine
learning and its applications. The data in machine learning gives a brief intro
on what is data and the importance of a data with its types. There are several
machine learning algorithms in which some of the general algorithms are also
mentioned in this post can be broadly
classified under four branches namely supervised, unsupervised,
semi-supervised, reinforcement learning algorithms. Machine learning is now in
its golden age thanks to the development bigdata and its analytical tools. It
serves a main part in developing the future due to its vast applications in
every field which are mentioned in machine learning applications. Everything
has its dark side said that machine learning has also its limitations and there
are multiple areas of research waiting for improvement in machine learning. In
Conclusion this postdiscusses about machine learning applications and its
impact on humans and virtual Internet world.
Data in Machine
Learning
A data is defined as any unprocessed fact or information, value, text, sound or picture which is not being interpreted and analyzed. Collecting data is most important part of all Data Analysis, Machine Learning and Artificial Intelligence. Without the data, It is not possible to train any model or algorithms and all modern developments and automation will return to dust. Enterprises like Amazon are spending tons of money just to collect as much data as possible.
· TRAINING DATA : The data set
which is used to train a model. This is the data which the model or code originally
refers (for the input and output) and interprets it to learn from it.
· VALIDATION DATA: The dataset which
is used for validating of model, which fit on the training data with improving
parameters (initially set parameters before the model begins learning).
· TESTING DATA: In the final phase model is finely trained, testing data provides the unbiased validation. Then if the inputs of Testing data are given, the module will predict some values (without providing accurate output). After prediction, The model can be evaluated by comparing it with actual output present in the testing data. This is the method by which evaluation is done and sees how much our model has learned from the experiences from the training data
Types Of Machine Learning Algorithms
The machine learning algorithms differ with each other in their approach, the type of data structure used, the input and output, and the problem planned to solve.
Machine learning algorithms usually falls under the classification
of Supervised learning or Unsupervised learning. Figure above gives a brief picture
on classifications of ML algorithms.
A. Supervised Machine Learning Algorithm
It predicts the
output results or future events based on the training data set on which the
module is trained. The module generates functions based on the known data set,
then the learning algorithm produces the predictions about the output values.
The computer can provide target for any input after sufficient training. It
also compare its output with correct output to find errors in order to modify
the model accordingly.
- Classification: In classification the ML algorithms module draws a conclusion from the given testing values so that the module can determine which category the given observation belongs to. For example when mobiles receive SMS it can be classified as ‘spam’ or ‘not spam’, such that the module looks at the trained observational data and filter the spam sms accurately.
- Forecasting: It is the process for generating a predicted output about the future based on the past and present data, and is generally used to analyze trends.
B. Unsupervised Machine Learning
These are the
machine learning algorithms which cannot produce a distinct output like
supervised learning algorithms hence these algorithms are named as unsupervised
algorithms. It infer pattern from data without reference known, or labelled
outcomes. It is best used to find hidden patterns in data when you don’t need a
definite outcome.
·
Clustering: clustering is the process of
portioning of data sets. It is done by forming groups in the given data set of
similar ones(based on a certain criteria) .It is useful for the classification
of several data by grouping .
·
Dimension
reduction: While handling real world data they may be of different values from
different domains grouping them under a single topic is difficult. This is
where dimension reduction takes place, It reduces the number of values which
are used to extract the information required.
C. Semi Supervised
Learning Learning
It is a mixture of the both the Supervised and Unsupervised learning.
Generally a small amount of labeled Data and Unlabeled data is used for
training. A system uses this algorithm to constantly improve its learning
accuracy. A semi-supervised model is used for web page classification, speech
reorganization and even for genetic sequencing.
D. Reinforcement
Machine Learning
It is a part of Machine
learning in which agents take action in an environment to give an ideal
behaviour over a specific context to maximize performance. This algorithm
learns when environment interacts and it produces state and reward which is
intern send to the Agent. Simply a reward is a feedback for the agent to learn.
The reinforcement learning needs clever exploration mechanisms. These
algorithms offer robotics a frame work and set of tools for complicated
designs.
Machine Learning For Big Data Analytics
A Machine Learning algorithm is purely
based on the data provided to it i.e. the more data provided then more it
proves to be the best that’s why the big data has its own significant role in
Machine learning and Artificial intelligence. Before that what is big data and
what it does with Machine learning, the answer to question this is clearly
shown in this post. A Big Data is a collection of data that is interpreted
with the analytical system so that a ML code could ‘learn’ or improve its
accuracy.
Machine learning has nowadays become the
topic of discussion for everyone its only possible due to the reach of data
i.e. Big Data. It takes the most of its credits for creating the interpreted or
processed data and thus helping ML algorithms to continuously develop in
prediction accuracy by training and also for validation. Artificial
Intelligence is the term first coined in 1950 but it wasn’t pretty much
developed at that traditional time because there were no devices with huge
computational power and ability to process huge data, the Big Data provides the solution and has its
own place for the growth of machine learning and AI in current industrial
revolution 4.0.
An example of Big data: We use various
data collected from sensors such as temperature, humidity, pressure, location,
and various electronic devices to train a module and test it with the possible
predictions. The module plays a vital role is identifying the hidden patterns
in data that leads for improving its outputs accuracy. The structures or
unstructured complex data which are too complex to be processed by traditional
method is called Big Data.
Most Commonly Used ML Algorithms
Machine Learning has wide range of algorithms which can be classified under supervised, Unsupervised or others. But the most commonly algorithms are as follows
NaΓ―ve Bayes Classifier Algorithm (Supervised Learning - Classification) :
The NaΓ―ve Bayes classifier is based on Bayes’ theorem and classifies every value as independent of any other value. It allows us to predict a class/category, based on a given set of features, using probability.Despite its simplicity, the classifier does surprisingly well and is often used due to the fact it outperforms more sophisticated classification methods.
K Means Clustering Algorithm (Unsupervised Learning - Clustering) :
The K Means Clustering algorithm is a type of unsupervised learning, which is used to categorized unlabeled data, i.e. data without defined categories or groups. The algorithm works by finding groups within the data, with the number of groups represented by the variable K. It then works repeatedly to assign each data point to one of K groups based on the features provided.Support Vector Machine Algorithm (Supervised Learning - Classification) :
Support Vector Machine algorithms are supervised learning models that analyze data used for classification and regression analysis. They essentially filter data into categories, which is achieved by providing a set of training examples, each set marked as belonging to one or the other of the two categories. The algorithm then works to build a model that assigns new values to one category or the other.Linear Regression (Supervised Learning/Regression) :
Linear regression is the most basic type of regression. Simple linear regression allows us to understand the relationships between two continuous variable.Logistic Regression (Supervised learning – Classification) :
Logistic regression focuses on estimating the probability of an event occurring based on the previous data provided. It is used to cover a binary dependent variable, that is where only two values, 0 and 1, represent outcomes.Artificial Neural Networks (Reinforcement Learning) :
An artificial neural network (ANN) comprises ‘units’ arranged in a series of layers, each of which connects to layers on either side. ANNs are inspired by biological systems, such as the brain, and how they process information. ANNs are essentially a large number of interconnected processing elements, working in unison to solve specific problems.
Decision Trees (Supervised Learning – Classification/Regression) :
A decision tree is a flow-chart-like tree structure that uses a branching method to illustrate every possible outcome of a decision. Each node within the tree represents a test on a specific variable – and each branch is the outcome of that test.
Random Forests (Supervised Learning – Classification/Regression) :
Random forests or ‘random decision forests’ is an ensemble learning method, combining multiple algorithms to generate better results for classification, regression and other tasks. Each individual classifier is weak, but when combined with others, can produce excellent results. The algorithm starts with a ‘decision tree’ (a tree-like graph or model of decisions) and an input is entered at the top. It then travels down the tree, with data being segmented into smaller and smaller sets, based on specific variables.K Nearest Neighbor (Supervised Learning) :
The K-Nearest-Neighbor algorithm estimates how likely a data point is to be a meer of one group or another. It essentially looks at the data points around a single data point to determine what group it is actually in. For example, if one point is on a grid and the algorithm is trying to determine what group that data point is in (Group A or Group B, for example) it would look at the data points near it to see what group the majority of the points are in.An image reorganization or image
processing is a domain build on machine learning. Image recognition is the
ability of a machine to identify objects, place, people ,and several other
variables in an image connected to a computer vision
Prisma is a photo editing app that transforms users’ photos into
works of art by
applying the styles of famous artists or different and original patterns.
Prisma doesn’t simply apply a filter (like Instagram does) but creates new
photos following a model and, as the official description states, “a unique
combination of neural networks and artificial intelligence helps you turn
memorable moments into timeless art.”
How
does Prisma change a image into a masterpiece?
All ML applications (and Prisma follows the
same logic) train from information, parameters and models use them to improve
their algorithms independently, without human intervention.
Speech Recognition
Speech recognition is the mechanism by which a device identifies
or analyses the given speech in human linguistics and convert it into text or
vice versa. Speech recognition is also called as Automatic Speech Recognition (ASR).
Speech recognition has many applications such as in
car systems which enable to control the steering using audio prompt. Simple
voice commands can help us initiate phone calls ,select radio, search contacts,
load MP3 or load songs. In healthcare SR is implemented to take notes of
patient and a document is drafted for every patient which makes it easy to
maintain the digital records.
For Example: In Aerospace field NASA's Mars polar
Lander used speech recognition technology from Sensory, Inc. in the Mars
Microphone on the Lander.
Medical Diagnosis
ML provides various solutions and tools that can help in predictive analysis and analytical problems in many medical domains. It is used for the study of the importance of clinical values and of their combinations for prognosis, e.g. Disease prediction, for the extraction of medical knowledge for outcomes research, for therapy planning, reinforcement, and for overall Hospital management.
The measurement of Joint Space Width (JSW) in hand x-ray images of patients suffering from Rheumatoid Arthritis (RA) is a time consuming task for radiologists. Manual assessment lacks accuracy and is observer-dependent, which hinders an accurate evaluation of joint degeneration in early diagnosis and follow-up studies. Table below gives the details of various algorithms used with its accuracy rate of success in joint detection in the rheumatoid arthritis.
Table: Machine Learning Techniques Used For The Automatic
Joint Detection In Rheumatoid Arthritis
Algorithm | Disease | Accuracy |
Joint Localization, Active shape models | Rheumatoid Arthritis | 96% |
Manual, Colour and K-means image segmentation | Rheumatoid Arthritis | 93% |
Joint Localization, Contour delineation, ASM driven snakes | Rheumatoid Arthritis | 92% |
Statistical Arbitrage
In finance, statistical arbitrage refers to automated trading strategies that are typical of a short-term and involve a large number of securities. In such strategies, the user tries to implement a trading algorithm for a set of securities on the basis of quantities such as historical correlations and general economic variables. These measurements can be cast as a classification or estimation problem. The basic assumption is that prices will move towards a historical average.
Virtual Personal Assistants
Siri,
Alexa, Google assistant are very best examples of personal assistants. The name
itself tells main use of virtual assistant is to assist in searching with
respect to spoken commands. They can be used
with many of the applications
from smart home control to personal assistant who reminds our important dates.
They are improving consistently for example in Google’s keynote 2018 their
Google Assistant was able to book an appointment by talking with a human
assistant. The best examples of virtual assistants are
· Smart Home speakers: Amazon Echo, Google Home
· Smartphone: Samsung Bixby on Samsung S8,Siri
· Mobile Apps: Google Allo
Nowadays In this modern world, ML offers
solution to many complex real world problems through its approach, but
also there is a limit to the level of success in analyzing the
structured and unstructured information. Although ML has made various
transformations in some fields, ML programs often fail to produce expected
results. There are numerous reasons which make machine learning feeble
such as inadequacy of (suitable) data,
lack of access to the data, data bias, privacy issues, improperly chosen tasks,
data structures and algorithms, wrong tools and people, lack of resources, and
evaluation problems.
In the year 2018, a self-driving car was
launched successfully due to the efforts taken by Uber
but it failed to detect a pedestrian, and even he was killed unfortunately
after a collision. Attempts to use ML in healthcare with the IBM Watson system failed to deliver accurate
results even after several years of time and billions of investment.
In reality, machine learning applications:
– Need data or modules that have been all set physically by people. And still the process is not entirely automatic. ML applications never learn on their own; someone needs to teach it the differences between topics, words and concepts, etc.
– ML requires a very huge data sets and examples for training. ML can work out the difference among different data only if the documents about different modules are uploaded during the training process.
– Achieve accurate results
only if the training process is repeated with more set of data.
ML can improve its efficiency only by
adding – over and over again – more information.
– Need different patterns.
Only having data in the same genre makes the devices produce less accurate
results. ML can differentiate between the several meanings of the same word, or
politics from ecology for example, only if these meanings or other topics like
history, medicine, math, etc. are trained properly to the system.
– ML cannot improve in real
time. It can’t make a new model among the option it offers.
Nice work.Excited to read your next blogsπ
ReplyDeleteThank you.. π
DeleteVery interesting πππππ
ReplyDeleteThankx π
Delete