Machine Learning & ML Ops

With a background in machine learning and data science, I've worked on projects that involved building, deploying, and monitoring machine learning models in production.

Automating Machine Learning Pipelines and Lifecycle

At New Work, my team and I worked on a machine learning platform that automated the entire machine learning lifecycle.

We built and automated the pipelines on AWS using tools like Metaflow, AWS Step functions, AWS Batch and scheduled using AWS Cloudwatch events. The platform was deployed using Terraform and Terragrunt on AWS and automated using Github actions. Model deployment was done via BentoML through images on AWS ECR.

PythonAWSMetaflowAWS Step FunctionsAWS BatchAWS CloudwatchTerraformTerragruntGithub ActionsBentoMLAWS ECR

Building a Vector DB platform

At New Work, my team and I worked on building a vector database platform for the entire organization. This involved building a platform using the Qdrant vector database and deploying it on AWS EKS(Managed Kubernetes) using Terraform and Terragrunt.

We also built a REST API using Rust on top of the vector database to query and search for vectors. The platform was deployed using Github actions and monitored using AWS Cloudwatch.
The API used AWS Elasticache Redis for caching and was deployed on AWS EKS. It provided both REST and gRPC endpoints for querying vectors.

PythonQdrantAWS EKSTerraformTerragruntGithub ActionsAWS CloudwatchRustAWS ElasticachegRPC

Building a Feature Store POC

At New Work, my team and I worked on building a feature store POC for the entire organization to augment our ML platform offering.

This involved building a feature store using Feast and deploying it on AWS using Terraform and Terragrunt via Github actions. We used AWS DynamoDB as the online feature store and AWS Athena on S3 as the offline feature store.

The feature registry was hosted on RDS and the feature definitions were synced via a central Github repository.
The feature serving was done via python FastAPI and deployed on AWS ECS. The work was mostly a POC and was not deployed in production.

PythonFeastAWSTerraformTerragruntGithub ActionsAWS DynamoDBAWS AthenaRDSFastAPIAWS ECS

Marketing Mix Modeling (MMM) for International Brand Marketing

During my time at RedBull, I worked on many projects related to marketing analytics, building statistical models to understand the impact of marketing efforts on sales.
This involved getting digital as well as traditional marketing data from sources like Nielsen and building models to understand the impact of marketing and events on sales.

I used Python, R, SQL and Gretl over the course of these projects.

(Image credits: Jellyfish)

PythonRSQLGretlMarketing Analytics

Modeling and Scoring for Influencer Marketing

During my brief stint at a very early stage startup, I worked on building predictive models to score influencers based on their social media presence and interactions.

This involved getting data from social media APIs, building models to predict the future impact of these influencers and building a scoring system to rank them.

PythonSocial Media APIsInfluencer MarketingPredictive Modeling

Extracting Hypernym-Hyponym Relationships from Text

While I was doing my PhD in Artificial Intelligence from Universidad Politécnica de Madrid, I researched and built models to extract hypernym-hyponym relationships from text. I evaluated the state-of-the-art in this domain and built statistical and machine learning models to extract these relationships from text. I also published a free dataset and a paper on this topic in a leading language resource journal. (Paper, Repository)

PythonNLTKText MiningSpacyNumpyTensorflowPyTorchSciKit-Learn

Machine Learning for Product and User Behavior

At Voicemod, I worked on a lot of models to understand user behavior and product usage.
This involved

building recommendation systems to recommend new products to users
segmenting users based on their behavior
predicting CLTV(Customer Lifetime Value) of users
predicting RFM(Recency, Frequency, Monetary value) of users

I used Python, SQL, Tensorflow, SciKit-Learn, Numpy, Scipy, NLTK, Spacy, Pandas, Plotly and Streamlit for these projects. (Image credits: The Product Manager)

PythonSQLTensorflowSciKit-LearnNumpyScipyNLTKSpacyPandasPlotlyStreamlit