Data Science | Pulkit Goel

I am a Master's candidate at the Carnegie Mellon University - School of Computer Science majoring in Computational Data Science. I am currently looking for full time roles in Deep Learning and Machine Learning.

My major areas of interest include Natural Language Generation and Recommendation Systems. I have worked on applications of Machine Learning in multiple areas such as Natural language Processing, Computer Vision and Finance.

As part of my master's capstone project, I am currently working with Bosch Research on generating natural language from unstructured datasets such as a list of words.

I am also part of Professor Fei Fang's team at CMU where we are working on building a petition recommendation model for social media posts. Our team won the best paper award for the same at the Harvard's AI for Social Impact Workshop 2020.

EXPERIENCE

2020-2020

Data Science Intern

CLOUDFLARE

I work with the Business Intelligence team, developing models to recommend products to users based of their traffic statistics and demographics.

2017-2019

Quantitative Associate

GOLDMAN SACHS

I worked with Risk Informatics team based out of Bengaluru, India. my job included developing models to identify patterns in trading and market data and identify avenues of market risk, with a special focus on Commodities asset class.

2017-2018

Senior Member Technical

ARCESIUM (D.E. SHAW GROUP)

I worked with the Trade Reconciliation team as a full-stack developer focussing on data engineering for trade reconciliation platform. A side project of mine aimed to deploy a Blockchain for simulating smart contract transactions.

EDUCATION

2019-2020

Master's in Science

CARNEGIE MELLON UNIVERSITY

SCHOOL OF COMPUTER SCIENCE

Computational Data Science

2011-2015

Bachelor's of Technology

INDIAN INSTITUTE OF TECHNOLOGY (BHU) VARANASI

Computer Science and Engineering

EDUCATION

COURSEWORK

SKILLS

Deep Learning

Cloud Computing

Computational Ethics for NLP

Multilingual Natural Language Processing

Machine learning

Neural Networks for NLP

Multimodal Machine Learning

Interactive Data Science

PROJECTS

EXPERTISE

Contextual Natural Language Generation

Implementing UniLM for sentence generation for a set of concepts (from CommonGen dataset) with commonsense injection
Seq2seq modelling for text generation for structured dataset (such as WikiBio) using autoencoders & prototype edit methods

Topic Classification in Speech Processing

Trained a feedforward neural network to determine phoneme states from mel spectrogram frames of speech recordings
Trained and hypertuned a CNN using PyTorch for topic classification using GloVe word embeddings of the text generated

Anger to Constructive Criticism on Social Media

Project aims to capture public anger on Twitter regarding social issues and convert it to constructive criticism by recommending relevant petitions to users. Awarded best Poster at the Harvard's AI for Social Impact Workshop 2020
Trained an ensemble model of SVM, Naïve Bayes and CNN to classify tweets and recommended petitions using Bag of Word

Contextual Natural Language Generation

Deploying a Language model (UniLM) for sentence generation from a given set of words, from the CommonGen dataset and use attention-based approaches for commonsense injection
Seq2seq modeling for text generation for structured dataset (such as WikiBio) using autoencoders & prototype edit methods

Attention-based Speech-to-Text Generation

Trained a Pyramidal Bi-LSTM based Encoder-Decoder architecture to generate text for given speech utterances
Experimented with concepts like attention injection, gumbel noise, teacher forcing and beam search

Anger to Constructive Criticism on Social Media

Mined tweets regarding social issues and trained a BERT based neural model to classify tweets on hate speech / toxicity
Performed topic modelling for theme detection and implemented a Petition recommender system based on Bag of Words
Awarded best Poster at Harvard's AI for Social Impact Workshop 2020

Face Classification and Verification

Trained & comparatively analyzed CNN based architectures (MobileNet, AlexNet, ResNet variants) for face classification
Experimented with Cross Entropy, Triplet & Center loss functions, with a max verification accuracy of 0.93 on CelebA dataset

Big Data Analytics on Twitter data

Designed a scalable friend recommender system on ~ 1TB of user data, hosted on SQL DBMS & tested on live queries
Performed MapReduce on AWS EMR for ETL & used TF-IDF for tweet analysis & PageRank for social graph analysis

Bias Identification and Mitigation in Text

Identified social bias in MultiNLI & SNLI datasets by PMI scoring & obfuscated bias using context-based unigram replacement
Evaluated bias in word embeddings (GloVe & polyglot) using WEAT & proposed adversarial training to debias embeddings

Ride-sharing Service ML Pipeline

Implemented an end-to-end ML Pipeline to match cab riders with drivers by deploying GCP ML APIs on Google App Engine
Predicted cab fares by training XGBoost for feature engineering and tuned hyperparameters on Google AI Platform

Multilingual POS Tagging

Implemented a BiLSTM for POS tagging across 8 languages and experimented with GloVE, FastText & polyglot word embeddings to improve performance in a multilingual setup

YouTube Trend Analytics

Designed and implemented an analytical model to perform exploratory & statistical analysis on YouTube trending data
Model deployed linear regression to analyze factors causing videos to trend and identify biases in the data
Integrated visualization for data results using Tableau and developed a website showcasing the study results

CONTACT ME

CONTACT

Pulkit Goel

Language Technologies Institute

School of Computer Science - Carnegie Mellon University

Phone:

412-628-2010

Email:

pulkitgo@cs.cmu.edu

pulkit.26mar@gmail.com

​

EXPERIENCE

2020-2020

2017-2019

2017-2018

EDUCATION

2019-2020

2011-2015

COURSEWORK

PROJECTS

Contextual Natural Language Generation

Topic Classification in Speech Processing

Anger to Constructive Criticism on Social Media

Contextual Natural Language Generation

Attention-based Speech-to-Text Generation

Anger to Constructive Criticism on Social Media

Face Classification and Verification

Big Data Analytics on Twitter data

Bias Identification and Mitigation in Text

Ride-sharing Service ML Pipeline

Multilingual POS Tagging

YouTube Trend Analytics

CONTACT ME

Pulkit Goel

Phone:

Email: