​
I am a Master's candidate at the Carnegie Mellon University - School of Computer Science majoring in Computational Data Science. I am currently looking for full time roles in Deep Learning and Machine Learning.
​
My major areas of interest include Natural Language Generation and Recommendation Systems. I have worked on applications of Machine Learning in multiple areas such as Natural language Processing, Computer Vision and Finance.
As part of my master's capstone project, I am currently working with Bosch Research on generating natural language from unstructured datasets such as a list of words.
​
I am also part of Professor Fei Fang's team at CMU where we are working on building a petition recommendation model for social media posts. Our team won the best paper award for the same at the Harvard's AI for Social Impact Workshop 2020.
EXPERIENCE
2020-2020
Data Science Intern
CLOUDFLARE
I work with the Business Intelligence team, developing models to recommend products to users based of their traffic statistics and demographics.
2017-2019
Quantitative Associate
GOLDMAN SACHS
I worked with Risk Informatics team based out of Bengaluru, India. my job included developing models to identify patterns in trading and market data and identify avenues of market risk, with a special focus on Commodities asset class.
2017-2018
Senior Member Technical
ARCESIUM (D.E. SHAW GROUP)
I worked with the Trade Reconciliation team as a full-stack developer focussing on data engineering for trade reconciliation platform. A side project of mine aimed to deploy a Blockchain for simulating smart contract transactions.
EDUCATION
2019-2020
Master's in Science
CARNEGIE MELLON UNIVERSITY
SCHOOL OF COMPUTER SCIENCE
Computational Data Science
2011-2015
Bachelor's of Technology
INDIAN INSTITUTE OF TECHNOLOGY (BHU) VARANASI
Computer Science and Engineering
COURSEWORK

Deep Learning
Cloud Computing
Computational Ethics for NLP
Multilingual Natural Language Processing
Machine learning
Neural Networks for NLP
Multimodal Machine Learning
Interactive Data Science
PROJECTS
Contextual Natural Language Generation
-
Implementing UniLM for sentence generation for a set of concepts (from CommonGen dataset) with commonsense injection
-
Seq2seq modelling for text generation for structured dataset (such as WikiBio) using autoencoders & prototype edit methods
Topic Classification in Speech Processing
-
Trained a feedforward neural network to determine phoneme states from mel spectrogram frames of speech recordings
-
Trained and hypertuned a CNN using PyTorch for topic classification using GloVe word embeddings of the text generated
Anger to Constructive Criticism on Social Media
-
Project aims to capture public anger on Twitter regarding social issues and convert it to constructive criticism by recommending relevant petitions to users. Awarded best Poster at the Harvard's AI for Social Impact Workshop 2020
-
Trained an ensemble model of SVM, Naïve Bayes and CNN to classify tweets and recommended petitions using Bag of Word
Contextual Natural Language Generation
-
Deploying a Language model (UniLM) for sentence generation from a given set of words, from the CommonGen dataset and use attention-based approaches for commonsense injection
-
Seq2seq modeling for text generation for structured dataset (such as WikiBio) using autoencoders & prototype edit methods
Attention-based Speech-to-Text Generation
-
Trained a Pyramidal Bi-LSTM based Encoder-Decoder architecture to generate text for given speech utterances
-
Experimented with concepts like attention injection, gumbel noise, teacher forcing and beam search
Anger to Constructive Criticism on Social Media
-
Mined tweets regarding social issues and trained a BERT based neural model to classify tweets on hate speech / toxicity
-
Performed topic modelling for theme detection and implemented a Petition recommender system based on Bag of Words
-
Awarded best Poster at Harvard's AI for Social Impact Workshop 2020
Face Classification and Verification
-
Trained & comparatively analyzed CNN based architectures (MobileNet, AlexNet, ResNet variants) for face classification
-
Experimented with Cross Entropy, Triplet & Center loss functions, with a max verification accuracy of 0.93 on CelebA dataset
Big Data Analytics on Twitter data
-
Designed a scalable friend recommender system on ~ 1TB of user data, hosted on SQL DBMS & tested on live queries
-
Performed MapReduce on AWS EMR for ETL & used TF-IDF for tweet analysis & PageRank for social graph analysis
Bias Identification and Mitigation in Text
-
Identified social bias in MultiNLI & SNLI datasets by PMI scoring & obfuscated bias using context-based unigram replacement
-
Evaluated bias in word embeddings (GloVe & polyglot) using WEAT & proposed adversarial training to debias embeddings
Ride-sharing Service ML Pipeline
-
Implemented an end-to-end ML Pipeline to match cab riders with drivers by deploying GCP ML APIs on Google App Engine
-
Predicted cab fares by training XGBoost for feature engineering and tuned hyperparameters on Google AI Platform
Multilingual POS Tagging
-
Implemented a BiLSTM for POS tagging across 8 languages and experimented with GloVE, FastText & polyglot word embeddings to improve performance in a multilingual setup
YouTube Trend Analytics
-
Designed and implemented an analytical model to perform exploratory & statistical analysis on YouTube trending data
-
Model deployed linear regression to analyze factors causing videos to trend and identify biases in the data
-
Integrated visualization for data results using Tableau and developed a website showcasing the study results
CONTACT ME
Pulkit Goel
Language Technologies Institute
School of Computer Science - Carnegie Mellon University
​
Phone:
412-628-2010
​