Tamoghna Chakraborty
929-***-**** ******@***.*** https://github.com/tc3142 www.linkedin.com/in/tc3142 EDUCATION
New York University Tandon School of Engineering, M.S., Computer Engineering, 2022 (GPA: 3.5) New York, NY Relevant Coursework: Big Data, Deep Learning, NLP, Advanced ML, Internet Architecture Protocols, Probability and Stochastic Processes
SRM Institute of Science and technology, B.Tech, Electronics and Communication Engineering India SKILLS & INTERESTS
● Programming languages: Python, Java, Spark, SQL, R
● Machine learning & deep learning: supervised learning(regression, classification), unsupervised learning (clustering like k-means, dimensionality reduction like PCA), NLP, RNNs, CNNs
● Cloud: AWS, Azure, GCP
● Analytical: Statistical analysis, data mining, experiment design
● Network protocols and technologies: DNS, BGP, SNMP, OSPF, PING, DHCP, TCP/IP, OSI, HTTP
● Business analytics tools: Power BI, Tableau
● Certifications: LinkedIn Advanced SQL for Data Scientists, Coursera Python Object-Oriented Programming EXPERIENCE
Junior software engineer
Marlabs LLC, Piscataway, NJ July 2022- present
● Designed and Implemented efficient pipelines for migrating millions of records from on-prem databases to S3, boosting performance by over 15% and reducing costs.
● Automated ETL processes, reducing manual workload by 10% monthly.
● Maintained 99.8% data uptime while integrating multiple data sources.
● Engaged in daily stand-ups, sprint planning, and collaborated with business teams to deliver features on time using Agile methodologies.
● Rectified critical codebase errors and optimized algorithms, enhancing application performance by 40% and reducing load times by 25%.
PROJECTS
Automated Job Listings updation (Python 3, Postgresql, Spark 3, Amazon Redshift, Scala, AWS:CLI, EC2, S3, Glue, Lambda, CloudFormation, Cloudwatch, Tableau)
● Transferred data from on-premises to S3 via EC2.
● Leveraged Redshift to preprocess and process data calculating metrics like job listing counts and average salaries.
● Set up CloudWatch alerts to auto-scale EC2 and manage database and S3 updates.
● Implemented CloudFormation for resource organization and deployment.
● Created a Tableau dashboard to visualize statistics, improving hiring policy insights. Financial fraud detection (Python, numpy, pandas, matplotlib, scikit-learn, tensorflow)
● Analyzed 10 million financial transactions to uncover fraudulent activity using Python and Spark.
● Conducted extensive feature engineering on transaction attributes, enhancing model accuracy.
● Achieved precision and recall rates of approximately 91% and 88% respectively across various algorithms.
● Deployed the model on a Spark cluster using AWS EMR to process real-time data.
● Implemented continuous monitoring of the EMR cluster using AWS CloudWatch for seamless operation. Sentiment Analysis System Development (Natural Language processing, Python, Keras, AWS, Kafka, Spark )
● Collaborated in a data science team to create a sentiment analyzer with exceptional accuracy levels (~95%) using client and user reviews.
● Engineered deep learning models (LSTM, BERT) to perform sentiment analysis and topic modeling on extensive review datasets.
● Spearheaded the establishment of a real-time data pipeline for continuous sentiment analysis model updates, leveraging Apache Kafka and Spark Streaming.
● Employed AWS cloud services to seamlessly scale the model for production use. Collaborative-filter based Recommender System (Python, PySpark, SQLite)
● Conducted Exploratory Data Analysis to pinpoint crucial dataset attributes and select pertinent fields.
● Imported data from cloud storage to Spark dataframes using Databricks with PySpark.
● Performed feature engineering: eliminated irrelevant columns, converted to Parquet format, hashed columns for partitioning, and repartitioned for enhanced parallelization.
● Utilized PySpark SQL to create a window specification and rank validation data.
● Trained the model using alternating least squares from PySpark MLlib package.
● Optimized hyperparameters and assessed model performance using MAP (Mean Average Precision).