Tuesday, June 2, 2020

Machine Learning Algorithms



    Over the past few decades, Machine Learning (ML) has been dynamically becoming the state-of-art technology in various real time applications. ML steps into various fields mainly medical diagnosis with computational approaches, statistics,  image processing, graphical games and so on. This post focuses on reviewing the concepts of machine learning algorithms that are significant in recent applications. Also, discuses about ML role in Big Data Analytics. Various applications are glimpsed with suitable examples. Major analysis has been done on medical diagnosis. Further, the limitations of ML are also listed. 

Machine learning

Machine learning is the subset or an application of the AI that will provide computers and machines the ability to automatically learn and improve from experience without being directly programmed or developed. Machine learning mainly gives importance on the development of computer programs or modules that can access datasets and use it for self learning. Over the past two decades ML has become one of the mainstreams for information technology and Computer Science. In today's world more than million data is generated per day With those ever increasing amounts of data becoming available there is good reason to believe that Big data analysis will become even more pervasive as a necessary ingredient for technological thus improving several solutions by Machine learning and Deep learning .

Arthur Samuel, an expert in the field of AI and computer gaming, coined the word “Machine Learning”. He gave the definition of machine learning as – “The Field of study that gives computers the capability to learn without being explicitly programmed”.

 

Basic Difference in ML and Traditional Programming

  • Traditional Programming: Feed DATA (Input) + PROGRAM (logic), run it on machine and get output.
  • Machine Learning: Feed DATA(Input) + Output, run it on machine during training and the machine creates its own program(logic), which can be evaluated while testing.

ML programming

    The machine learning program or code learns from its experience ‘E’ when doing some set of tasks ‘T’ and performance measure ‘P’, if its performance at those tasks in T, as measured by P, then it improves with that experience E and it steadily continues to learn For example: playing  a  checkers  game. E is the experience of playing from the past games. T is the task for playing checkers. P is the probability that the program will win in the next game.

    This post mainly focuses on machine learning and its applications. The data in machine learning gives a brief intro on what is data and the importance of a data with its types. There are several machine learning algorithms in which some of the general algorithms are also mentioned in this post can be broadly classified under four branches namely supervised, unsupervised, semi-supervised, reinforcement learning algorithms. Machine learning is now in its golden age thanks to the development bigdata and its analytical tools. It serves a main part in developing the future due to its vast applications in every field which are mentioned in machine learning applications. Everything has its dark side said that machine learning has also its limitations and there are multiple areas of research waiting for improvement in machine learning. In Conclusion this postdiscusses about machine learning applications and its impact on humans and virtual Internet world.


  Data in Machine Learning



A data is defined as any unprocessed fact or information, value, text, sound or picture which is not being interpreted and analyzed. Collecting data is most important part of all Data Analysis, Machine Learning and Artificial Intelligence. Without the data, It is not possible to train any model or algorithms and all modern developments and automation will return to dust.  Enterprises like Amazon are spending tons of money just to collect as much data as possible.

An example for Importance of Data: Why did the Facebook bought WhatsApp by paying a huge price of $19 billion?
The answer to that is radical - it is to have access to the users’ information that Facebook may not have but WhatsApp will have. This information of their users is of high importance to Facebook as it can help in improving their services. 

 ·    TRAINING DATA :  The data set which is used to train a model. This is the data which the model or code originally refers (for the input and output) and interprets it to learn from it.

 ·   VALIDATION DATA: The dataset which is used for validating of model, which fit on the training   data with improving parameters (initially set parameters before the model begins learning).

 ·  TESTING DATA:  In the final phase model is finely trained, testing data provides the unbiased  validation. Then if the inputs of Testing data are given, the module will predict some values (without  providing accurate output). After prediction, The model can be evaluated by comparing it with actual output present in the testing data. This is the method by which evaluation is done and sees how much our model has learned from the experiences from the training data


Types Of Machine Learning Algorithms

 The machine learning algorithms differ with each other in their approach, the type of data structure used, the input and output, and the problem planned to solve. 

 

Machine learning algorithms usually falls under the classification of Supervised learning or Unsupervised learning. Figure above gives a brief picture on classifications of ML algorithms.

A.    Supervised  Machine Learning Algorithm

It predicts the output results or future events based on the training data set on which the module is trained. The module generates functions based on the known data set, then the learning algorithm produces the predictions about the output values. The computer can provide target for any input after sufficient training. It also compare its output with correct output to find errors in order to modify the model accordingly.

  • Classification: In classification the ML algorithms module draws a conclusion from the given testing values so that the module can determine which category the given observation belongs to. For example when mobiles receive SMS it can be classified as ‘spam’ or ‘not spam’, such that the module looks at the trained observational data and filter the spam sms accurately.
  •        Regression: In regression the ML algorithm modules estimates and understand the relation among the trained multiple values to form a statistical approach. Linear regression is the commonly used regression type of algorithm.
  •        Forecasting: It is the process for  generating a predicted output about the future based on the past and present data, and is generally used to analyze trends.

B.   Unsupervised Machine Learning

These are the machine learning algorithms which cannot produce a distinct output like supervised learning algorithms hence these algorithms are named as unsupervised algorithms. It infer pattern from data without reference known, or labelled outcomes. It is best used to find hidden patterns in data when you don’t need a definite outcome.

·          Clustering: clustering is the process of portioning of data sets. It is done by forming groups in the given data set of similar ones(based on a certain criteria) .It is useful for the classification of several data by grouping .

·          Dimension reduction: While handling real world data they may be of different values from different domains grouping them under a single topic is difficult. This is where dimension reduction takes place, It reduces the number of values which are used to extract the information required.

C.    Semi  Supervised  Learning Learning

It is a mixture of the both the Supervised and Unsupervised learning. Generally a small amount of labeled Data and Unlabeled data is used for training. A system uses this algorithm to constantly improve its learning accuracy. A semi-supervised model is used for web page classification, speech reorganization and even for genetic sequencing.

D.    Reinforcement Machine Learning

It is a part of Machine learning in which agents take action in an environment to give an ideal behaviour over a specific context to maximize performance. This algorithm learns when environment interacts and it produces state and reward which is intern send to the Agent. Simply a reward is a feedback for the agent to learn. The reinforcement learning needs clever exploration mechanisms. These algorithms offer robotics a frame work and set of tools for complicated designs.


Machine Learning For Big Data Analytics


A Machine Learning algorithm is purely based on the data provided to it i.e. the more data provided then more it proves to be the best that’s why the big data has its own significant role in Machine learning and Artificial intelligence. Before that what is big data and what it does with Machine learning, the answer to question this is clearly shown in this post. A Big Data is a collection of data that is interpreted with the analytical system so that a ML code could ‘learn’ or improve its accuracy.

Machine learning has nowadays become the topic of discussion for everyone its only possible due to the reach of data i.e. Big Data. It takes the most of its credits for creating the interpreted or processed data and thus helping ML algorithms to continuously develop in prediction accuracy by training and also for validation. Artificial Intelligence is the term first coined in 1950 but it wasn’t pretty much developed at that traditional time because there were no devices with huge computational power and ability to process huge data, the  Big Data provides the solution and has its own place for the growth of machine learning and AI in current industrial revolution 4.0.

An example of Big data: We use various data collected from sensors such as temperature, humidity, pressure, location, and various electronic devices to train a module and test it with the possible predictions. The module plays a vital role is identifying the hidden patterns in data that leads for improving its outputs accuracy. The structures or unstructured complex data which are too complex to be processed by traditional method is called Big Data.


 Most Commonly Used ML Algorithms

Machine Learning has wide range of algorithms which can be classified under supervised, Unsupervised or others. But the most commonly algorithms are as follows

NaΓ―ve Bayes Classifier Algorithm (Supervised Learning -    Classification) :

The NaΓ―ve Bayes classifier is based on Bayes’ theorem and classifies every value as independent of any other value. It allows us to predict a class/category, based on a given set of features, using probability.
Despite its simplicity, the classifier does surprisingly well and is often used due to the fact it outperforms more sophisticated classification methods.

K Means Clustering Algorithm (Unsupervised Learning - Clustering) : 

The K Means Clustering algorithm is a type of unsupervised learning, which is used to categorized unlabeled data, i.e. data without defined categories or groups. The algorithm works by finding groups within the data, with the number of groups represented by the variable K. It then works repeatedly to assign each data point to one of K groups based on the features provided. 

Support Vector Machine Algorithm (Supervised Learning - Classification) :

Support Vector Machine algorithms are supervised learning models that analyze data used for classification and regression analysis. They essentially filter data into categories, which is achieved by providing a set of training examples, each set marked as belonging to one or the other of the two categories. The algorithm then works to build a model that assigns new values to one category or the other.

 Linear Regression (Supervised Learning/Regression) :

Linear regression is the most basic type of regression. Simple linear regression allows us to understand the relationships between two continuous variable. 

Logistic Regression (Supervised learning – Classification) :

Logistic regression focuses on estimating the probability of an event occurring based on the previous data provided. It is used to cover a binary dependent variable, that is where only two values, 0 and 1, represent outcomes.

Artificial Neural Networks (Reinforcement Learning) :              

An artificial neural network (ANN) comprises ‘units’ arranged in a series of layers, each of which connects to layers on either side. ANNs are inspired by biological systems, such as the brain, and how they process information. ANNs are essentially a large number of interconnected processing elements, working in unison to solve specific problems.

Decision Trees (Supervised Learning – Classification/Regression) : 

A decision tree is a flow-chart-like tree structure that uses a branching method to illustrate every possible outcome of a decision. Each node within the tree represents a test on a specific variable – and each branch is the outcome of that test.

Random Forests (Supervised Learning – Classification/Regression) :

Random forests or ‘random decision forests’ is an ensemble learning method, combining multiple algorithms to generate better results for classification, regression and other tasks. Each individual classifier is weak, but when combined with others, can produce excellent results. The algorithm starts with a ‘decision tree’ (a tree-like graph or model of decisions) and an input is entered at the top. It then travels down the tree, with data being segmented into smaller and smaller sets, based on specific variables.

K Nearest Neighbor (Supervised Learning) :

The K-Nearest-Neighbor algorithm estimates how likely a data point is to be a meer of one group or another. It essentially looks at the data points around a single data point to determine what group it is actually in. For example, if one point is on a grid and the algorithm is trying to determine what group that data point is in (Group A or Group B, for example) it would look at the data points near it to see what group the majority of the points are in.

 Machine learning Applications

Image Recognition

An image reorganization or image processing is a domain build on machine learning. Image recognition is the ability of a machine to identify objects, place, people ,and several other variables in an image connected to a computer vision

Prisma is a photo editing app that transforms users’ photos into works of art by applying the styles of famous artists or different and original patterns. Prisma doesn’t simply apply a filter (like Instagram does) but creates new photos following a model and, as the official description states, “a unique combination of neural networks and artificial intelligence helps you turn memorable moments into timeless art.”

How does Prisma change a image into a masterpiece?

All ML applications (and Prisma follows the same logic) train from information, parameters and models use them to improve their algorithms independently, without human intervention.

Speech Recognition

Speech recognition is the mechanism by which a device identifies or analyses the given speech in human linguistics and convert it into text or vice versa. Speech recognition is also called as Automatic Speech Recognition (ASR).

Speech recognition has many applications such as in car systems which enable to control the steering using audio prompt. Simple voice commands can help us initiate phone calls ,select radio, search contacts, load MP3 or load songs. In healthcare SR is implemented to take notes of patient and a document is drafted for every patient which makes it easy to maintain the digital records.

For Example: In Aerospace field NASA's Mars polar Lander used speech recognition technology from Sensory, Inc. in the Mars Microphone on the Lander.

Medical Diagnosis

ML provides various solutions and tools that can help in predictive analysis and analytical problems in many medical domains. It is used for the study of the importance of clinical values and of their combinations for prognosis, e.g. Disease prediction, for the extraction of medical knowledge for outcomes research, for therapy planning, reinforcement, and for overall Hospital management.

The measurement of Joint Space Width (JSW) in hand x-ray images of patients suffering from Rheumatoid Arthritis (RA) is a time consuming task for radiologists. Manual assessment lacks accuracy and is observer-dependent, which hinders an accurate evaluation of joint degeneration in early diagnosis and follow-up studies. Table below gives the details of various algorithms  used with its accuracy rate of success in joint detection in the rheumatoid arthritis.

Table: Machine Learning Techniques Used For The Automatic Joint Detection In Rheumatoid Arthritis

 Algorithm 
 Disease 
 Accuracy   
Joint Localization, Active shape models
 Rheumatoid Arthritis    
 96%    
Manual, Colour and K-means image segmentation
 Rheumatoid Arthritis    
 93%    
Joint Localization, Contour delineation, ASM driven snakes
 Rheumatoid Arthritis    
 92%    

 Detection of Lung diseases by ML:


Example of generating image in Figure show patches through the annotations of a CT slice. The lung field is displayed with transparent red. The polygons are the ground truth areas with considered pathological. The patches have 100% overlap with the lung, at least 80% overlap with the ground truth and 0% overlap with each other. 

 

Statistical Arbitrage

In finance, statistical arbitrage refers to automated trading strategies that are typical of a short-term and involve a large number of securities. In such strategies, the user tries to implement a trading algorithm for a set of securities on the basis of quantities such as historical correlations and general economic variables. These measurements can be cast as a classification or estimation problem. The basic assumption is that prices will move towards a historical average. 

Virtual Personal Assistants

Siri, Alexa, Google assistant are very best examples of personal assistants. The name itself tells main use of virtual assistant is to assist in searching with respect to spoken commands. They can be used  with many of  the applications from smart home control to personal assistant who reminds our important dates. They are improving consistently for example in Google’s keynote 2018 their Google Assistant was able to book an appointment by talking with a human assistant. The best examples of virtual assistants are

    ·      Smart Home speakers: Amazon Echo, Google Home

    ·      Smartphone: Samsung Bixby on Samsung S8,Siri

    ·      Mobile Apps: Google Allo


 The Limits Of Machine Learning Algorithms

Nowadays In this modern world, ML offers solution to many complex real world problems through its approach, but also there is a limit to the level of success in analyzing the structured and unstructured information. Although ML has made various transformations in some fields, ML programs often fail to produce expected results. There are numerous reasons which make machine learning feeble such as inadequacy of (suitable) data, lack of access to the data, data bias, privacy issues, improperly chosen tasks, data structures and algorithms, wrong tools and people, lack of resources, and evaluation problems.

In the year 2018, a self-driving car was launched successfully due to the efforts taken by Uber but it failed to detect a pedestrian, and even he was killed unfortunately after a collision. Attempts to use ML in healthcare with the IBM Watson system failed to deliver accurate results even after several years of time and billions of investment.

In reality, machine learning applications:

– Need data or modules that have been all set physically by people. And still the process is not entirely automatic. ML applications never learn on their own; someone needs to teach it the differences between topics, words and concepts, etc.

– ML requires a very huge data sets and examples for training. ML can work out the difference among different data only if the documents about different modules are uploaded during the training process.

– Achieve accurate results only if the training process is repeated with more set of data. ML  can improve its efficiency only by adding – over and over again – more information.

– Need different patterns. Only having data in the same genre makes the devices produce less accurate results. ML can differentiate between the several meanings of the same word, or politics from ecology for example, only if these meanings or other topics like history, medicine, math, etc. are trained properly to the system.

– ML cannot improve in real time. It can’t make a new model among the option it offers.


This provides an idea of Machine learning and its algorithms available for real world problems. Machine Learning is a rapidly developing field in modern computer science. ML has its applications nearly in every field of study and it is already being implemented commercially. Machine learning can solve problems too difficult or time consuming for humans to solve. It can be used to do jobs that are done on the basics of daily repetition where there is no need of human intrusion.  It will be exciting to see where machine learning goes in the next 20 years and how it’ll change our lives for the better. It is said that in reality machine learning’s improvement has grown too far and machines will take over humans in the near future but it is not ‘Machine Vs Humans’ it is actually the ‘Machines and Humans’ who would join and work together today for a better tomorrow.



4 comments:

Machine Learning Algorithms

     Over the past few decades, Machine Learning (ML) has been dynamically becoming the state-of-art technology in various real time applica...