automatic-differentiation, julialang, projects

Automatic Differentiation

Automatic Differentiation(AD) is a very popular need in the rise of machine learning applications. Since my interests are in Machine Learning(ML), I will be judging AD from this perspective. Many frameworks have their way of calculating gradients. Some automatically calculate them but restrict users to use only control structures and operations that they have provided, and some allow users to define derivatives. However, both approaches are restrictive. The dream is to have the language handle it like all other arithmetic operations. However, it wasn’t possible because it wasn’t the main concern, until now. In recent years, Google has considered Swift language for the base to migrate its framework, Tensorflow. One of the main reasons for that is the Swift is JIT compiled by LLVM and it is a mainstream language for developers (I know Swift is used for IOS development, and I don’t know if there are any other fields it is used in…). Without judging the selection of the language, I would like to point out that, Tensorflow is being integrated into the Swift language, which is good, exciting. However, until it is ready for public use it will be out of reach of the individual users where they have to fix every bug coming in their way and have to implement a feature if it does not exist(It may have progressed my observation was almost a year ago). From this approach, if Swift or Tensorflow does not support a feature that I would like to use in the research I must change Swift or Tensorflow base. Both of these projects are massive, wandering around with the research ideas and having to understand, develop/fix, compile, and test in this massive frameworks almost intractable challenge. Before I jump into the second part, I must mention that I’ve used Tensorflow in my Master’s Thesis and tried Swift for Tensorflow, but, that was the end. Because, Swift for Tensorflow wasn’t mature enough, and even some of the basic examples shown in the keynote wasn’t working at the time. Disclaimer, I think it is a great and exciting project that will allow ML to move into new horizons with ease, someday, however, I needed a working language, which I can change, fix if necessary. This is where Julia comes into the stage. Julia is also a language compile JIT, by LLVM. However, this time focus is scientific, every library is written in Julia so that one can even fix a bug if it is in the library and run it again. Which was like a dream, in the search of a base language where I will implement and experiment with my research ideas. Second good thing is that Julia was already into the integration of AD. Because, the first language compile JIT, and heavily allowed meta-programming, which I think more extensive than the Python has provided. I am not denigrating Python, I’ve used it for a long time and did great projects (involving meta-programming) where I could never have done in Java or C(that easy at least). Julia had many AD libraries when I started using it. Some of them were recording operation on tape and recompiling it so that it can run fast(and it does I assure you). However, this would eliminate the possibility of differentiating through loops and conditions which I needed most. I could use it to generate reverse differentiate each time however, this would be slow too. When you want to back-propagate a lot, it starts to matter… Then I’ve found another, which was already integrated to an ML library(I am saying library on purpose, because, Julia is kind of allergic to frameworks, the motto is to have libraries not frameworks, at least that was the gist that I got…). My initial impression is, it was great, I’ve never been that happy during my short research career. Freedom, functionality, and speed at the same time. One doubts when they hear such a sentence and think too good to be true?. It is true, however, at a cost. When everything is a library, you need to optimize your code, since you got your freedom back from the framework which speeds it up for you… It wasn’t a big problem for me because I am using it for research and slowdown isn’t much and I can optimize the parts of the way I like it and speed it up. I must mention that slow of Julia gonna be way faster than average slow one can imagine… With this, I have moved along and learned a lot of things by implementing actual code rather than using constructs of a framework. However, with that happy that I am creating my libraries and integrating them in ease, I thought that I would never come to the point where I need to create my AD library. Well, I came to that point, and yes, I have created my AD library. It was a lot easier than what I thought a year ago.

I decided to build one when I watched one of the talks of Prof. Edelman, “Automatic Differentiation in 10 minutes with Julia”. Which was very quick, unexpected and functional implementation of Forward Differentiation using the concept of DualNumbers, which is an extension of Real Numbers that allow derivative of operations to be calculated simultaneously. It was a very impressive presentation that show the power of the Julia on the stage even though professor went modest with it. With that inspiration I created a type called BD(for reverse differentiation), it is not exactly Backward-Dual or something similar. I considered it more like the Backward-Derivation. What I have in mind is to both run operations forward and create a function to calculate reverse-differentiation of the operation that run at the time. Therefore, as the operations are come after one another, they will create a network of linked functions programmatically. I all had to do was implement necessary interfaces for my new type. They were very basically, type promotion, indexing, broadcasting, arithmetic operations and some special functions. I haven’t implemented them all but even in this stage it works impressively(in my opinion of course). Another disclaimer, I am not a mathematican, or a person with their primary interest in AD. But, trust me I am an engineer…

Link to the GitHub page of my AD package

Standard

Leave a comment