automatic-differentiation, julialang, projects

A Little Update for AutoDiff.jl

Recently, I have blogged about Automatic Differentiation (AD) and my motives for developing an AD library to experiment on it. Even though I have recently made it public, I have been investigating other Julia packages about AD, especially how they define the differentiation rules. If a Julia package isn’t designed for a very specific task, then it probably consists of multiple sub-packages that support its functionality. Julia language has an ecosystem where packages are designed as multiple sub-packages to handle part of the requirements. This has two positive effects of the Julia package ecosystem; one may find a package for various needs, every project contributes to this ecosystem with its sub-packages which is good for the open-source ecosystem (There are nice tips and guides on the Julia Blog).

What is my point? The point is some of the earliest AD libraries of Julia are using their own set of differentiation rules. They are also providing ways to define custom differentiation rules and avoid automatic differentiation process. It is good, however, knowing that the behavior of the AD process may change every time one switches a library isn’t very intuitive. On the other hand, it is not easy to integrate a library to work with another. Because the focus of the libraries vary; they may be focusing on limiting features and functionality to gain speed.

Since I have learned more about Julia’s ecosystem, I have started digging into collections maintained by Organizations on GitHub. JuliaDiff is one of them. It is really nice to be able to find most of the packages developed for a purpose, under the same roof. There are multiple Julia packages for AD. There are also very well documented packages that do AD tasks. However, they seem to opt-out modularity. I have been looking into DiffRules.jl for a while. They have a very nice set of differentiation rules, also ways to define custom rules. However, most importantly they return symbolic derivations for operations. This makes things extendable… Some of the AD packages I have mentioned so far(without giving names) also modular(in their own way…) but not using already existing DiffRules.jl package, which is only causing the same differentiation-rules to be re-defined again every time someone develops a new library(for example; I couldn’t use a very popular one because it wasn’t extendable). I believe DiffRules is generic enough, since it return derivative function symbolically, any library may have built their own upon them or change them on package initialization(like I did this time). However, most did not…

As I have published AutoDiff.jl library, I have also re-defined a small subset of functions(Because this was just an experimental project on mine). However, in order to be functional, more functions needed to be defined. As one can guess I did not define them by hand… Because there is DiffRules.jl. Instead, I have followed some kind of a meta-programming approach. I was able to generate AutoDiff.jl compatible rules out of every rule defined in the DiffRules.jl. This means now, AutoDiff.jl potentially much more capable(as much as DiffRules.jl allows at least). An interesting thing is that only 64 lines of code were enough for it to maintain its existing functionality and expand on it…

The latest changes are merged into the master branch since I have not observed any loss of functionality.

Standard
automatic-differentiation, julialang, projects

Automatic Differentiation

Automatic Differentiation(AD) is a very popular need in the rise of machine learning applications. Since my interests are in Machine Learning(ML), I will be judging AD from this perspective. Many frameworks have their way of calculating gradients. Some automatically calculate them but restrict users to use only control structures and operations that they have provided, and some allow users to define derivatives. However, both approaches are restrictive. The dream is to have the language handle it like all other arithmetic operations. However, it wasn’t possible because it wasn’t the main concern, until now. In recent years, Google has considered Swift language for the base to migrate its framework, Tensorflow. One of the main reasons for that is the Swift is JIT compiled by LLVM and it is a mainstream language for developers (I know Swift is used for IOS development, and I don’t know if there are any other fields it is used in…). Without judging the selection of the language, I would like to point out that, Tensorflow is being integrated into the Swift language, which is good, exciting. However, until it is ready for public use it will be out of reach of the individual users where they have to fix every bug coming in their way and have to implement a feature if it does not exist(It may have progressed my observation was almost a year ago). From this approach, if Swift or Tensorflow does not support a feature that I would like to use in the research I must change Swift or Tensorflow base. Both of these projects are massive, wandering around with the research ideas and having to understand, develop/fix, compile, and test in this massive frameworks almost intractable challenge. Before I jump into the second part, I must mention that I’ve used Tensorflow in my Master’s Thesis and tried Swift for Tensorflow, but, that was the end. Because, Swift for Tensorflow wasn’t mature enough, and even some of the basic examples shown in the keynote wasn’t working at the time. Disclaimer, I think it is a great and exciting project that will allow ML to move into new horizons with ease, someday, however, I needed a working language, which I can change, fix if necessary. This is where Julia comes into the stage. Julia is also a language compile JIT, by LLVM. However, this time focus is scientific, every library is written in Julia so that one can even fix a bug if it is in the library and run it again. Which was like a dream, in the search of a base language where I will implement and experiment with my research ideas. Second good thing is that Julia was already into the integration of AD. Because, the first language compile JIT, and heavily allowed meta-programming, which I think more extensive than the Python has provided. I am not denigrating Python, I’ve used it for a long time and did great projects (involving meta-programming) where I could never have done in Java or C(that easy at least). Julia had many AD libraries when I started using it. Some of them were recording operation on tape and recompiling it so that it can run fast(and it does I assure you). However, this would eliminate the possibility of differentiating through loops and conditions which I needed most. I could use it to generate reverse differentiate each time however, this would be slow too. When you want to back-propagate a lot, it starts to matter… Then I’ve found another, which was already integrated to an ML library(I am saying library on purpose, because, Julia is kind of allergic to frameworks, the motto is to have libraries not frameworks, at least that was the gist that I got…). My initial impression is, it was great, I’ve never been that happy during my short research career. Freedom, functionality, and speed at the same time. One doubts when they hear such a sentence and think too good to be true?. It is true, however, at a cost. When everything is a library, you need to optimize your code, since you got your freedom back from the framework which speeds it up for you… It wasn’t a big problem for me because I am using it for research and slowdown isn’t much and I can optimize the parts of the way I like it and speed it up. I must mention that slow of Julia gonna be way faster than average slow one can imagine… With this, I have moved along and learned a lot of things by implementing actual code rather than using constructs of a framework. However, with that happy that I am creating my libraries and integrating them in ease, I thought that I would never come to the point where I need to create my AD library. Well, I came to that point, and yes, I have created my AD library. It was a lot easier than what I thought a year ago.

I decided to build one when I watched one of the talks of Prof. Edelman, “Automatic Differentiation in 10 minutes with Julia”. Which was very quick, unexpected and functional implementation of Forward Differentiation using the concept of DualNumbers, which is an extension of Real Numbers that allow derivative of operations to be calculated simultaneously. It was a very impressive presentation that show the power of the Julia on the stage even though professor went modest with it. With that inspiration I created a type called BD(for reverse differentiation), it is not exactly Backward-Dual or something similar. I considered it more like the Backward-Derivation. What I have in mind is to both run operations forward and create a function to calculate reverse-differentiation of the operation that run at the time. Therefore, as the operations are come after one another, they will create a network of linked functions programmatically. I all had to do was implement necessary interfaces for my new type. They were very basically, type promotion, indexing, broadcasting, arithmetic operations and some special functions. I haven’t implemented them all but even in this stage it works impressively(in my opinion of course). Another disclaimer, I am not a mathematican, or a person with their primary interest in AD. But, trust me I am an engineer…

Link to the GitHub page of my AD package

Standard