Data engineering

Location:

Brooklyn, NY

Posted:

June 18, 2024

Contact this candidate

Resume:

Tamoghna Chakraborty

929-***-**** ******@***.*** https://github.com/tc3142 www.linkedin.com/in/tc3142 EDUCATION

New York University Tandon School of Engineering, M.S., Computer Engineering, 2022 (GPA: 3.5) New York, NY Relevant Coursework: Big Data, Deep Learning, NLP, Advanced ML, Internet Architecture Protocols, Probability and Stochastic Processes

SRM Institute of Science and technology, B.Tech, Electronics and Communication Engineering India SKILLS & INTERESTS

● Programming languages: Python, Java, Spark, SQL, R

● Machine learning & deep learning: supervised learning(regression, classification), unsupervised learning (clustering like k-means, dimensionality reduction like PCA), NLP, RNNs, CNNs

● Cloud: AWS, Azure, GCP

● Analytical: Statistical analysis, data mining, experiment design

● Network protocols and technologies: DNS, BGP, SNMP, OSPF, PING, DHCP, TCP/IP, OSI, HTTP

● Business analytics tools: Power BI, Tableau

● Certifications: LinkedIn Advanced SQL for Data Scientists, Coursera Python Object-Oriented Programming EXPERIENCE

Junior software engineer

Marlabs LLC, Piscataway, NJ July 2022- present

● Designed and Implemented efficient pipelines for migrating millions of records from on-prem databases to S3, boosting performance by over 15% and reducing costs.

● Automated ETL processes, reducing manual workload by 10% monthly.

● Maintained 99.8% data uptime while integrating multiple data sources.

● Engaged in daily stand-ups, sprint planning, and collaborated with business teams to deliver features on time using Agile methodologies.

● Rectified critical codebase errors and optimized algorithms, enhancing application performance by 40% and reducing load times by 25%.

PROJECTS

Automated Job Listings updation (Python 3, Postgresql, Spark 3, Amazon Redshift, Scala, AWS:CLI, EC2, S3, Glue, Lambda, CloudFormation, Cloudwatch, Tableau)

● Transferred data from on-premises to S3 via EC2.

● Leveraged Redshift to preprocess and process data calculating metrics like job listing counts and average salaries.

● Set up CloudWatch alerts to auto-scale EC2 and manage database and S3 updates.

● Implemented CloudFormation for resource organization and deployment.

● Created a Tableau dashboard to visualize statistics, improving hiring policy insights. Financial fraud detection (Python, numpy, pandas, matplotlib, scikit-learn, tensorflow)

● Analyzed 10 million financial transactions to uncover fraudulent activity using Python and Spark.

● Conducted extensive feature engineering on transaction attributes, enhancing model accuracy.

● Achieved precision and recall rates of approximately 91% and 88% respectively across various algorithms.

● Deployed the model on a Spark cluster using AWS EMR to process real-time data.

● Implemented continuous monitoring of the EMR cluster using AWS CloudWatch for seamless operation. Sentiment Analysis System Development (Natural Language processing, Python, Keras, AWS, Kafka, Spark )

● Collaborated in a data science team to create a sentiment analyzer with exceptional accuracy levels (~95%) using client and user reviews.

● Engineered deep learning models (LSTM, BERT) to perform sentiment analysis and topic modeling on extensive review datasets.

● Spearheaded the establishment of a real-time data pipeline for continuous sentiment analysis model updates, leveraging Apache Kafka and Spark Streaming.

● Employed AWS cloud services to seamlessly scale the model for production use. Collaborative-filter based Recommender System (Python, PySpark, SQLite)

● Conducted Exploratory Data Analysis to pinpoint crucial dataset attributes and select pertinent fields.

● Imported data from cloud storage to Spark dataframes using Databricks with PySpark.

● Performed feature engineering: eliminated irrelevant columns, converted to Parquet format, hashed columns for partitioning, and repartitioned for enhanced parallelization.

● Utilized PySpark SQL to create a window specification and rank validation data.

● Trained the model using alternating least squares from PySpark MLlib package.

● Optimized hyperparameters and assessed model performance using MAP (Mean Average Precision).

Contact this candidate