Professional Skills:
Programming: Python, R, Tableau, R Shiny, Java, HTML, CSS, JavaScript, GitHub, SQL
Data Science: Machine Learning, Predictive Modeling, Natural Language Processing, Causal Inference, Survival Analysis, Time Series, Visualization
Education
- MS in Statistical Practice from Carnegie Mellon University (4.0 GPA)
- B.S. Degree Statistics & Mathematics from Washington University
Professional Experience
Twitter, Inc.
May 2016 – Present
Data Scientist
- Develop supervised and unsupervised machine learnings (Random Forest, SVM, Hierarchical Clustering, PCA etc.), and advanced statistical models (Regressions Models, Time Series, Survival Analysis, etc.) to interpret complex data to guide and influence people and business decisions
- Lead research/analytics and collaborate with stakeholders across Twitter in the areas of attrition analysis, productivity modeling, workforce planning, employee engagement/assessment, performance, and behaviors
- Establish Natural Language Processing Frameworks to get insights from unstructured text data
- Collaborate with IT and stakeholders to create ETL pipeline
- Build interactive Tableau dashboards to present meaningful visualizations for better storytelling
- Spoke at conferences for branding and networking
University of Pittsburgh
May 2015 – May 2016
Statistician
- Performed Monte Carlo simulations and Logistic Regression to access the performance of Instrumental Variable, Propensity Score Matching, Stratification and Covariate Adjustment to estimate marginal odds ratio
- Applied Survival Analysis, Cox Proportional Hazards Model, Log Rank Test, Relative Risk Test to predict intestine graft donor risk index
- Investigated complications situations of total kidney volume growth and renal decline
Shanghai Hwabao Securities Co., Ltd
July 2015- April 2016
Data Analyst
- Quantified the amount and probability of investment risks and potential loss of asset portfolio with the VaR (Value at Risk) model
- Derived the variance, standard deviation and covariance from historical datasets
- Data Mining Project: Advertising Image Classification
- Classified advertising image with data mining techniques, and achieved 98.2% correct classification
- Implemented five supervised algorithms including Random Forest, Ada Boosting, SVM, K-Nearest Neighbors, and Lasso Logistic Regression
- Prediction on Listener’s Identification of Music
- Developed Hierarchical Bayes models to predict listener’s identification of music (“classical” or “popular”)
- Investigated the best estimation by implementing the Markov Chain Monte Carlo algorithm with JAGS/RUBE in R
- The Impact of Open Learning Initiative on Instructors and Students
- Utilized SQL to refine and export datasets from over 50 unstructured raw datasets
- Performed Exploratory Data Analysis to discover the trend and pattern of users’ usage and performance
- Collaborated with clients and discussed data problems such as missing or unrealistic measurements
Chicago Crime Department
June 2012 – June 2014
Research
- Analyzed domestic crimes and arrest rate of Chicago’s 77 neighborhoods in terms of Location, Crime Type and Time
- Built interactive R application using rCharts and Shiny to display Pie Charts, Bar Charts and Time Series Plots
- Created maps to compare domestic crimes and arrest rate for different neighborhoods and time chunks in Chicago