Tech Stack: Python, LSTM(Long Short-Term Memory), Yahoo Finance API, Streamlit.
Overview: The app combines real-time stock data with AI-driven forecasting and dashboards to give users actionable insights.
Methodology: Built LSTM neural networks to predict 5-day stock price movements.
Created interactive dashboard with technical indicators(e.g., EMA, RSI, OBV) by utilizing Plotly.
Impact: Deployed an intuitive platform for investors to make data-driven decisions through predictive analytics and dynamic trend analysis.
----------------------------------------------------------------------------
Justice Forecast App: Solvability Analysis for Homicide Cases
Tech Stack: Python, LightGBM, Pandas, MatplotLib, Streamlit.
Overview: Justice Forecast App, an end-to-end machine learning application, predicts homicide case's solvability.
The project emphasizes the importance of accounting for unsolved homicides and understanding the key factors impacting case solvability.
Methodology: Utilized LightGBM to analyze critical factors like the crime circumstances, homicide year, and victim's age for prediction.
Product: The interactive app allows users to explore how each factor affects outcomes and understand the predictions in real-time.
Impact: Designed and deployed machine learning model by transforming complex algorithms into a practical, user-friendly application.
Objective: Developed a model to classify IMDb movie reviews into positive, negative, and mixed categories.
Methodology: Fine-tuned pre-trained BERT model and implemented rule-based function using model logits to detect mixed reviews.
Tools/Technologies: Hugging Face Transformers, PyTorch for modeling, Scikit-Learn for evaluation, Matplotlib for visualization.
Impact: Achieved 92% classification accuracy, showcasing practical application of sentiment analysis.
Communication: Published results in Medium article to explain the approach to a broader audience.
Multivariate Time Series Forecast via Neural Networks: Apple and Google Stocks
GridSearchCV, Deep Learning, LSTM, Recurrent Neural Network (RNN)
Objective: Build deep learning model to forecast stock prices using historical multivariate time series data.
Methodology: Applied domain-based feature engineering, sliding windows function, normalization.
Built and fine-tuned LSTM neural networks.
Tools/Technologies: Used Pandas for data processing, Matplotlib/Seaborn for visualization, Keras for LSTM model development.
Impact: Achieved robust performance with 0.02% MAPE and 98% R2, reliable forecasting by deep learning in volatile market.
Communication: Published the project on GitHub for open-source collaboration and sharing.
SQL Database & Data Analysis: Tableau Dashboards for Covid-19 Burden on World throughout Years
SQL, Relational Database, Tableau, Dashboard
Objective: Conduct comprehensive analysis of Covid-19 data to uncover key trends in cases, deaths, and vaccination progress across different countries.
Methodology: Utilized SQL queries for ETL, data analysis and BI tool to explore data temporally and spatially.
Tools/Technologies: Used PostgreSQL for database management and querying, Tableau for data visualization
Impact: Generated interactive dashboards that provided findings on global Covid-19 trends, comparisons between countries, and tracking the pandemic progression.
Communication: Published the analysis results with an interactive dashboards on Tableau website, making the findings accessible for further exploration and decision-making.
Decision Tree and Random Forest Assisted Suggestions for Employee Retention
Employee Churn, Decision Trees, Ensemble Learning, Machine Learning
Objective: Analyze HR data to predict employee churn and identify employee leave/stay incentives.
Methodology: Applied decision trees, random forests, and logistic regression models.
Tools/Technologies: Used Pandas for data manipulation, Matplotlib for data visualization, Scikit-Learn for model implementation, GridSearch for hyperparameter tuning.
Impact: Produced random forest model with 94% F1-score and identified significant features influencing the target variable.
Communication: Visualized model performance through feature importance plots and confusion matrix for model interpretability.
Gradient Boosting Predictive Model for TikTok's Claim Classification: Hypothesis Testing, Logistic Regression, Tree-Based Models
XGBoost, GridSearchCV, Classification, Explanatory Data Analysis(EDA)
Objective: Develop machine learning model to assist in the classification of videos as either claims or opinions.
Methodology: Conducted EDA, hypothesis testing, logistic regression, and build tree-based classification models.
Tools/Technologies: Used Pandas for data manipulation, Matplotlib/Seaborn for data visualization, Scikit-Learn for regression and ML model implementation, GridSearch for hyperparameter tuning.
Impact: Achieved 99% F1-score classification accuracy and optimized user submissions workflows.
Communication: Documented the project as executive summaries and shared the project on GitHub.
Feature Engineering on Study Mobility: Dashboards on Students' Preferences
Python, Geographic Data Analysis, Tableau, Dashboards
Objective: Identification of study mobility patterns, anomalies, and trends over decade.
Methodology: Conducted data cleaning, data structuring, feature engineering, and build interactive graphs and dashboards.
Tools/Technologies: Used Pandas for data manipulation, Tableau for data visualization.
Impact: Explored global student mobility patterns and trends over time, provided valuable insights into educational preferences across different nations.
Communication: Published interactive dashboards on Tableau website, making the analysis intuitive and accessible.