Hi, my name is

Jonathan Sucuc.

I turn data into insights.

I'm a data scientist who builds predictive models and end-to-end ML pipelines. I specialize in turning messy, complex data into clear, actionable insights.

View My Work

About Me

Now that I have your full attention, let me tell you how my journey began. I got hooked on data science through mathematical modeling competitions and incredible research projects at Columbia University, where I discovered that the right algorithm could unlock patterns hidden in massive datasets.

Since then, I've worked across research labs building predictive models for everything from NYC real estate trends to coastal storm surge forecasting. I love the entire ML pipeline—wrangling messy data, engineering features, training ensemble models, and validating results with rigorous statistical tests.

Whether it's achieving 92% accuracy on classification tasks or optimizing time series forecasts, I'm driven by turning complex data into actionable insights. My work spans urban analytics, climate modeling, and economic prediction—always focused on building robust, interpretable models.

Here are some technologies I've been working with recently:

  • Python, R (Tidyverse)
  • Machine Learning
  • SQL & Database Design
  • Time Series Modeling
  • Statistical Testing
  • Git & Version Control
Jonathan Socoy

Research Experience

Mathematical Modeling & Machine Learning Researcher @ Columbia University

June 2021 - May 2024

  • Developed ensemble machine learning models (Random Forest, XGBoost) achieving up to 92% prediction accuracy on MCM datasets
  • Engineered feature pipelines with dimensionality reduction (PCA) reducing feature space by 30% while maintaining model performance
  • Conducted rigorous statistical validation using F-tests and ANOVA to assess model significance and feature importance across multivariable regression frameworks
  • Built interactive visualizations and analytical dashboards using Matplotlib and Seaborn for stakeholder presentations

Data Science Researcher @ Professor Ali Hirsa's Lab

September 2023 - January 2024

  • Architected ETL pipeline transforming 500K+ Yelp Business API records into normalized SQL database schema
  • Performed spatial-temporal analysis revealing correlation between business turnover rates and property value changes
  • Created longitudinal tracking system for business evolution patterns across NYC neighborhoods

Computational Research Assistant @ Professor Kyle Mandli's Lab

May 2021 - June 2022

  • Automated data acquisition pipeline using BeautifulSoup and API integration to extract 10+ years of NOAA tide and storm surge observations
  • Optimized tidal prediction algorithms in Clawpack through ensemble time series techniques
  • Performed statistical analysis on storm surge patterns using Pandas for data wrangling and time series decomposition
  • Contributed to open-source climate modeling tools with improvements to coastal flooding predictions

Some Things I've Built

Featured Project

COVID-19 & NYC Property Values

An end-to-end geospatial analysis examining whether population density became a housing price penalty after COVID-19. Built parcel-to-district crosswalks, performed area-weighted spatial joins, and conducted regression analysis across 59 NYC community districts.
  • R (VIF, tidycensus, leaflet)
  • PLUTO & ACS Data
  • Regression Analysis (ANOVA)
NYC Property Analysis

Featured Project

Housing Affordability Analysis

Metro-level affordability study exploring rent burden and housing growth dynamics. Constructed standardized indices, compared CBSA-level patterns, and developed policy recommendations for YIMBY-style housing incentives.
  • R (tidyverse, tidycensus)
  • Policy Analysis
  • Data Visualization (ggplot2)
Housing Affordability

Featured Project

NYC Green Canopy Visualization

Geospatial study of NYC's 600,000+ street trees across City Council Districts. Performed point-in-polygon spatial joins, mapped tree health conditions, and identified environmental equity gaps to guide urban forestry interventions.
  • R (httr2, rvest, sf)
  • NYC Open Data
  • Spatial Analysis (ggplot2,leaflet)
Green Canopy

04. What's Next?

Get In Touch

I'm currently looking for new opportunities in data science and spatial analytics. Whether you have a question or just want to say hi, I'll do my best to get back to you!

Say Hello

Beyond Data Science

When I'm not building models, I'm passionate about giving back to my community and staying active. Here are some things I do outside of work:

Community Service

Sous Chef & Food Program Coordinator @ Broadway Community, Inc.

Sept 2023 - Present

Managed meal operational logistics for 250+ homeless and low-income guests weekly Mentored 300+ volunteers and contributed 500+ hours to food distribution

View Recipes & Stories →

Education & Mentoring

Residential Counselor @ Academic Success Program, Columbia University

Summer 2023

Coordinated support services for 65 first-year students, tracking engagement and academic performance metrics Developed structured tutoring schedules and organized community-building activities.

Hobbies & Interests

  • Photography - Capturingportraits, graduation photos and urban landscapes
  • Cycling - Exploring NYC's bike trails and neighborhoods
  • Tutoring - Mathematics, Statistics, and Spanish for grade school students