Jacob Johnson - Data Scientist

Jacob Johnson

Data Scientist | NLP Enthusiast | Computer Vision Explorer

About Me

As a seasoned data analyst, I've honed my skills in extracting valuable insights from complex datasets, and I'm passionate about transforming data into actionable knowledge. My journey has solidified my commitment to data-driven decision-making. Now, I'm eager to embrace the next challenge by transitioning into the dynamic field of Data Science and Machine Learning. I aim to leverage advanced analytics, predictive modeling, and AI technologies to uncover deeper insights from data. This portfolio showcases projects reflecting my diverse skill set, from Natural Language Processing (NLP) to Computer Vision. I'm excited about this phase in my career, where my experience as a data analyst forms a strong foundation for my journey into Data Science. I'm ready to contribute my skills and explore new horizons in the world of data.

Highlighted Projects

LLM Question-Answering Application

LLM Question-Answering Application

This LLM Question-Answering Application offers a user-friendly interface for seamlessly extracting insights from documents. Embeddings are generated for free using all-MiniLM-L6-v2 model from HuggingFace. These embeddings are stored utilizing the free open source vector store FAISS. And the results are generated in real time using OpenAI's gpt-3.5-turbo which can be found here LLM Question-Answering Application

YOLO-NAS & EasyOCR Automatic Number Plate Recognition

YOLO-NAS & EasyOCR Automatic Number Plate Recognition

This project uses YOLO-NAS and EasyOCR to detect license plates and perform Optical Character Recognition (OCR) on them. The project includes both image and video processing capabilities and has been deployed as a Streamlit web application. This is an update to a previous project, Optical-Character-Recognition-WebApp

Sentiment Analysis with DistilBERT + Streamlit

Sentiment Analysis with DistilBERT + Streamlit

Here we leverage a subset of the amazon_polarity dataset to train two machine learning models: an LSTM model with GloVe embeddings and a fine-tuned DistilBERT model. The LSTM model achieved an accuracy of 80.40%, while the DistilBERT model outperformed with an impressive 90.75% accuracy. Predictions can made in real time via our streamlit Sentiment Analysis with DistilBERT + Streamlit

Vehicle Detection + Tracking App

Vehicle Detection + Tracking App

Streamlit web application for vehicle tracking using different SOTA object detection models. The app offers two options: YOLO-NAS with SORT tracking and YOLOv8 with ByteTrack and Supervision tracking. It enables users to upload a video file, set confidence levels, and visualize the tracking results in real-time. Code: Vehicle Detection + Tracking App

Question Answering App with BERT and Flask

Question Answering App with BERT and Flask

This project demonstrates a user-friendly web application that uses a pre-trained BERT-based model to answer questions based on a given passage. The app is built using Python, the transformers library for BERT, Flask for the web framework, and HTML/CSS for the interactive user interface.

Credit Card Default Web App

Credit Card Default Web App

This project was created to predict credit card defaults based on customer profiles, achieving a high ROC AUC score of 0.7882 The model analyzes borrower information, such as age, income, and financial indicators, to identify customers at risk of defaulting. This project also contains a streamlit web app capable of making predictions given a customer profile. Steamlit App

Vehicle Detection + Tracking App

Telecom Churn Analysis and Prediction

Thhis repository contains the code for analyzing telecom churn rate. The aim of this project is to predict whether a customer will churn or not based on various features. In this project, we analyzed the telecom churn rate using various machine learning algorithms. The best-performing model was XGBoost with an accuracy of 81.92%. We also performed SHAP analysis to interpret the XGBoost model and found that MonthlyCharges, Tenure, and InternetService_Fiber optic were the most important features in predicting churn. Tableau Dashboard


Data Analysis

Skilled in data analysis, statistical modeling, data visualization, and designing insightful dashboards to drive data-informed decisions.


Machine Learning

Proficient in building and deploying machine learning models for predictive analytics and pattern recognition.



Experienced in implementing end-to-end machine learning pipelines and managing model deployment.

Amazon SageMaker

Natural Language Processing (NLP)

Proficient in using NLP techniques for text processing, sentiment analysis, and large language modeling.

Huggingface LLMs

Computer Vision

Experience with deep learning frameworks for image recognition, object detection, image segmentation and image generation.


Areas of Interest