Recent projects:

I. Machine Learning

Interest rates predictions

This project aims to predict the base interest rates in Australia, US, and UK in the coming months. I demonstrate a complete cycle in Machine Learning and Data Engineering project, including an ETL process to collect and transform data from different data sources. The data is then stored in Azure Storage and is used for model development. The model is then deployed and served as a web service, which pushes the prediction results onto a Github page for visualization.

View on GitHub

Project dashboard

Data pipelines and architecture

Modelling baseball players’s performance

This work is supported by Google Summer of Code project, which adds support for Multi-output Gaussian processes in PyMC.

We model the performances of different sport players by leveraging Multi-output Gaussian processes (MOGPs), which can simultaneously learn and infer many outputs with the same source of uncertainty. The following picture shows the estimated sprin rates of three top pitchers in different game dates. Please check the PyMC example for further details.

View on GitHub View on PyMC

Aussie Social Sentiment Analysis

This project collects data from Twitter’s APIs, then cleans and stores in a sql database. The data is then used for model training and prediction of sentiment analysis for other tweets. The webapp uses Dash visualisations and is deployed on Herokuapp.

View on GitHub View Demo

II. MLOps & Data Engineering

Accelerating testing CI pipelines

Continuous Integration - Continuous Development (CI/CD) plays an essential role in MLOps and DataOps. One of the popular CI/CD tools is GitHub Actions. To enhance CI/CD pipelines, we can use the matrix strategy in GitHub Actions to run parallel tasks.

For example, I have recently leveraged the strategy.matrix feature to significantly accelerate the testing CI pipelines for PyTensor from 75 mins to around 26 mins (a 65% reduction in running time). Please check this pull request for further details.

View on GitHub

Building Docker images for PyMC

I have built a docker file for PyMC v4, which support both GPUs and CPUs version. The dockerfile has been merged into PyMC project’s code base in this pull request.

The docker image is then published on Docker Hub, so users can easily pull the image and set up their environment.

View on GitHub

III. Data Science

Data Visualisation: Melbournian Daily Activities

This project visualises the daily activities of Melbournians in different areas.

View on GitHub View Demo

Domain Scenery Views - Telegram Bot

A Telegram bot in Python, which will automatically push the top listings into the Domain channel on Telegram each day. The top listings show properties with the most beautiful views like beaches, lakes, and city views in Australia cities. Customers can join Domain Scenery Views channel on Telegram to receive the news.

View on GitHub

For more projects, please check my github@danhphan.