GSoC TensrFlow

Task

To develop a library of traditional machine learning algorithms based on Swift for TensorFlow. Swift for TensorFlow is a next-generation platform for machine learning, incorporating the latest research across machine learning, compilers, differentiable programming, systems design and beyond. Its advantages include seamless integration with a modern general-purpose language, allowing for more dynamic and sophisticated models. The defined task was to create a library containing traditional machine learning algorithms.

In a collaborative effort with Victor Antony(GSoC student) and GSoC mentors Dan Zheng, Richard Wei and Paige Bailey, we have created a machine learning library based on Swift for TensorFlow - swiftML.

Community bonding period:

I learned about Swift Package manager, Swift for TensorFlow project and Google’s Swift coding style. I got an insight about the Swift For TensorFlow’s Tensor, different operations on Tensor, Raw TensorFlow operations and integration with a modern general-purpose language. It helped me in writing mathematical operations and implementing overall algorithms.

First coding phase:

In the first coding phase, I have implemented linear regression, logistic regression, K-means clustering, K nearest neighbors classifier and regressor. Along with this, I have implemented different helper tensor operation are pseudoinverse of the matrix, euclidean distance and minkowski distance.

  • Linear Regression implemented using two different methods singular value decomposition and gradient descent.
  • Logistic Regression implementation is based on the sigmoid function of the dependent variable. In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme.
  • K-means clustering algorithms along with random initialization of centroids supports Kmeans++ - Heuristic Initialization of centroids for better performance.
  • K nearest neighbors classifier classify based on k-nearest neighbors vote, K nearest neighbors regressor predicts based on local interpolation of the targets nearest to neighbors in the training set and K nearest neighbors classifier and regressor both supports two weight functions used in the prediction are uniform - All points in each neighborhood are weighted equally and distance - weight points by the inverse of their distance.

Second coding phase:

In the second coding phase, I have implemented principal component analysis, bernoulli naive bayes, multinomial naive bayes and gaussian naive bayes. Along with this, I have implemented a helper tensor operation to compute the determinant of singular vector decomposition (SVD).

  • Principal component analysis implementation contains SVD based implementation along with Minka’s MLE implementation to guess the component dimension if the component count is not provided.
  • Multinomial naive bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts.
  • Bernoulli naive bayes classifier is suitable for discrete data. The difference is that while Multinomial naive bayes works with occurrence counts, Bernoulli naive bayes is designed for binary features.
  • Gaussian naive bayes classifier is suitable for continuous variables.

Third coding phase:

The major part of this period was to avoid memory leaks and making design safe, documenting the code, creating end to end example notebooks.

Next step:

The swiftML project will be continued on improvements and hoping to become a scikit-learn for Swift.

Acknowledgment:

I would like to acknowledge and extend my heartfelt gratitude to my mentors Dan Zheng, Richard Wei and Paige Bailey for guiding me throughout the development process. It is great to be a part of the TensorFlow team. I would also like to Google Summer of Code for providing me this opportunity and giving me a chance to work with TensorFlow.

  • Project Link: The swiftML project repository contains all the GSoC 2019 work and current developments: swiftML
  • End-to-end examples - https://github.com/param087/swiftML/tree/master/Notebooks
  • Following are the merged PRs - (dates differ from the actual coding period timeline as the changes were merged into above repo later than their actual implementation)
    • Linear Regression - 5
    • Logistic Regression - 7
    • KMeans Clustering - 9
    • K Nearest Neighbors classifier and regressor - 11
    • Principal Component Analysis - 15
    • Bernoulli, Multinomial and Gaussian Naive Bayes - 13