Saturday, July 29, 2017

Yodiwo joins Endeavour network


Yodiwo has been officialy accepted in the Endeavor network.
Entrepreneur: Alex Maniatopoulos
Company: Yodiwo
Description: What isn’t connected to the Internet nowadays? Though more systems are coming online, implementing Internet of Things (IoT) projects is incredibly complex, time-intensive, and costly. Engineers write thousands of lines of code in order to connect expensive smart devices, manage system workflows, and measure results. Yodiwo offers an affordable, code-free IoT application enablement platform plus customized solutions to expedite the process of constructing IoT applications and interconnected networks. Yodiwo’s proprietary three-tiered platform–Wisper, Cyan, and Alcyone–saves clients 90% on application development time, 30% on system operating costs, and and 40-50% on capital expenditure.
Our CEO Alex Maniatopoulos after a rigorous multiday interview convinced the panellists that yodiwo had all it takes to join their sucessful network.

This great milestone opens a brand new world for yodiwo and many business opportunities.


Thursday, July 6, 2017

Deep Learning : Why you should use gradient clipping

One new tip that I got reading Deep Learning is clipping gradients. It's been common knowledge amongst practitioners for years but somehow I missed it.

Problem

The problem with strongly nonlinear objective functions, such as those computed in recurrent or deep networks, is that their derivatives tend to be either very large or ver small in magnitude. 

The steep regions resemble cliffs and they result from the multiplication of several large weights together. On the face of an extremely steep cliff structure, the gradient update step can move the parameters extremely far, usually jumping off the cliff structure together, undoing much of the work that had been done to reach the current solution. 

The gradient tells us the direction that corresponds to the steepest descent within an infinitesimal region surrounding the current parameters. Outside this tiny region, the cost function may begin to curve back upward. The update must be chosen to be small enough to avoid traversing too much upward curvature. 

Solution

One solution would be to have very small learning rate. This solution is problematic as it will slow training and maybe settle in a sub-optimal region. A much better solution is clipping the gradient (or norm-clipping). There are many instantiations of this idea but the main concept is to limit the gradient to a maximum number and if the gradient exceeds that number rescale the gradient parameters so the are limited within it. This customization retains the direction but limits the step size.

Thoughts

If your jobs involves training a lot of deep learning models automatically, then you should eliminate any unpredicable steps that require manual labor. We are engineers after all so whatever can be automated should be automated and no more. For me the problem was the unpredictability of the training. For a percentage of initializations in training mode gradient would explode. The reactionary solution was to lower the learning rate, but that costs time and money. In addition to that I wanted something that always works and thus can automated. Gradient clipping worked nicely in this regard and it allowed me to up the learning rate so that the training converges much faster.

Conclusion

Use gradient clipping everywhere, my default option is to limit to 1. In Caffe it is a single line in the solver and if your framework doesn't support it is easy to implement it yourself. You will save yourself enormous headaches and time.

Gradient Clipping in Tensorflow

1. Global Norm, this is the most usual case


optimizer = tf.train.AdamOptimizer(1e-3) 
gradients, variables = zip(*optimizer.compute_gradients(loss)) gradients, _ = tf.clip_by_global_norm(gradients, 1.0) 
optimize = optimizer.apply_gradients(zip(gradients, variables))

2. Local Norm


optimizer = tf.train.AdamOptimizer(1e-3) 
gradients, variables = zip(*optimizer.compute_gradients(loss)) 
gradients = [ None if gradient is None else tf.clip_by_norm(gradient, 1.0) for gradient in gradients] 
optimize = optimizer.apply_gradients(zip(gradients, variables))

* From the book "Deep Learning"

For a high level understanding of deep learning click here

Monday, July 3, 2017

Udacity AI nanodegree

I enrolled in Udacity's AI nanodegree 2 months ago and I just learned I was accepted.
I thought it would be a good refresher and maybe fill in some knowledge gaps I have.
The reviews on the net are pretty good so I'm pretty sure it will be a great experience especially since there will be AI legends like Peter Norvig doing the teaching.

The curriculum consits of five parts

  1.  Foundations of AI : In this Term, you'll learn the foundations of AI with Sebastian Thrun, Peter Norvig, and Thad Starner. We'll cover Game-Playing, Search, Optimization, Probabilistic AIs, and Hidden Markov Models. 
  2.  Deep Learning and Applications : In this term, you'll learn the cutting edge advancements of AI and Deep Learning. You'll get the chance to apply Deep Learning on a variety of different topics including Computer Vision, Speech, and Natural Language Processing. We'll cover Convolutional Neural Networks, Recurrent Neural Networks, and other advanced models. 
  3. Computer Vision : In this module, you will learn how to build intelligent systems that can see and understand the world using Computer Vision. You'll learn fundamental techniques for tasks like Object Recognition, Face Detection, Video Analysis, etc., and integrate classic methods with more modern Convolutional Neural Networks.
  4. Natural Language Processing : In this module, you will build end-to-end Natural Language Processing pipelines, starting from text processing, to feature extraction and modeling for different tasks such as Sentiment Analysis, Spam Detection and Machine Translation. You'll also learn how to design Recurrent Neural Networks for challenging NLP applications.
  5. Voice User Interfaces : This module will help you get started in the exciting and fast-growing area of designing Voice User Interfaces! You'll learn how to build Conversational Agents for products and services more natural to interact with. You will also dive deeper into the core challenge of Speech Recognition, applying Recurrent Neural Networks to solve it.
In my projects so far I've mostly tackled Computer Vision and Predictive Analytics problems, so it would be a nice change to dive into NLP and Voice processing.
I hope I can fit it in my busy schedule and I'll try to write some posts describing the experience for any future students.