Stalin Goud
Data Engineer
**************@*****.*** +1-510-***-****
PROFESSIONAL SUMMARY
Results-driven Data Engineer with 3+ years of experience in building scalable data pipelines, optimizing ETL workflows, and automating data infrastructure using AWS, Spark, Python, and SQL. Skilled in designing high-performance data architectures to enhance analytics, reduce costs, and improve processing speeds by 30% or more. Proven expertise in big data technologies, stream processing, and cloud-native solutions, with a focus on data security, automation, and performance optimization. Adept at collaborating with cross-functional teams to translate business needs into actionable data solutions.
TECHNICAL SKILLS
Programming Languages: Scala, Python, R, SQL
IDEs: Eclipse, IntelliJ IDEA, PyCharm, Jupyter Notebook, Visual Studio Code
Big Data Ecosystem: Hadoop, MapReduce, Hive, HDFS, Spark, Kafka, PySpark, Apache Airflow, Sqoop, Flume, Nifi, Oozie, Zookeeper, Apache Flink
Machine Learning: Linear Regression, Logistic Regression, Decision Trees, SVM, K-Means, Random Forest
Cloud Platforms and Services:
AWS: S3, EMR, Redshift, Glue, Lambda, Kinesis, Athena, DynamoDB, CloudWatch, IAM, VPC.
Google Cloud: BigQuery, Cloud SQL, Cloud Composer/Airflow, Cloud Storage, Dataflow/Data Fusion, Dataproc, Pub/Sub
Azure: Data Lake, Data Factory, Databricks, Logic Apps, HDInsight, Synapse Analytics, Azure DevOps
Data Science Libraries: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow
CI/CD and Visualization Tools: Jenkins, Tableau, Power BI, SSIS, SSRS, SSMS, Docker, Kubernetes, Terrform, Git, Ansible
BI & Analytics: Tableau, Power BI, QuickSight
Databases: SQL Server, PostgreSQL, MongoDB, HBase
Security & Compliance: Data Encryption, Role-Based Access Control (RBAC), GDPR, AWS IAM
Operating Systems: Windows, MacOS, Linux.
WORK EXPERIENCE
Principal Financial New Jersey, USA September 2022 - Present
Data Engineer
Leveraged Spark and Scala APIs for comparative performance analysis between Hive and SQL, leading to a 25% improvement in data processing efficiency.
Designed and implemented scalable ETL pipelines using AWS Glue, Redshift, and PySpark, improving data processing speeds by 30%.
Developed and optimized real-time streaming solutions using Kafka and AWS Kinesis, reducing latency for critical business insights.
Engineered and deployed Tableau dashboards, resulting in a 10% productivity boost for the team by providing real-time, actionable insights.
Automated data infrastructure and workflow orchestration using Apache Airflow, improving pipeline reliability and efficiency by 40%.
Migrated on-prem SQL workloads to AWS, leveraging S3, Athena, and Redshift, cutting operational costs by 20%.
Built data security frameworks using AWS IAM, encryption, and multi-region backups, ensuring compliance with GDPR and other security standards.
Optimized PySpark scripts, achieving a 30% increase in processing speed and significantly enhancing data quality.
Orchestrated a full-scale AWS migration, utilizing Lambda, S3, and Redshift to reduce operational costs by 20%, while improving data quality scores by 50% and boosting machine learning model accuracy by 15%.
Collaborated with product, analytics, and DevOps teams to support data infrastructure scalability, enhancing reporting and analytics capabilities by 35%.
Streamlined Hive schema design and implemented advanced performance techniques, resulting in a 45% improvement in query efficiency and a 30% reduction in processing time.
Integrated Apache Airflow with AWS to oversee multi-stage ML workflows and developed CI/CD pipelines with Git and Jenkins, improving deployment efficiency by 40%.
Architected and executed robust data pipelines on GCP using Apache Airflow, significantly improving workflow automation and reliability.
KPIT Technologies Maharashtra, India August 2020 - July 2022
Data Engineer & Data Engineer Intern
Processed and integrated over 100 GB of data daily using Spark SQL, ensuring optimal data quality and reducing processing errors.
Developed high-performance ETL pipelines to process 100GB+ of structured & unstructured data daily, reducing query execution time by 25%.
Enhanced SQL query optimization techniques, improving data retrieval speed by 20%.
implemented Azure Data Lake and Data Factory pipelines, improving analytics efficiency by 30%.
Designed a Python-based live data conduit using Apache Kafka and AWS Lambda, enhancing real-time data ingestion by 25%.
Built Power BI dashboards that automated reporting, saving 10+ hours per week.
Enhanced data retrieval speed by 20% through the refinement of Hive-based data pipelines.
Engineered Azure Data Lake and Data Factory pipelines to handle structured and unstructured data, boosting analytics capabilities by 30%.
Developed a Python-based live data conduit utilizing Apache Kafka and AWS Lambda, improving data flow by 25% and enabling real-time actionable insights.
Leveraged Apache Airflow for optimized pipeline execution, improving processing efficiency by 25%.
Executed data cleansing workflows using Python and SQL, reducing inaccuracies by 35%, leading to more reliable reporting and decision-making.
Created Power BI dashboards to visualize critical KPIs, saving 10 hours weekly in manual reporting tasks.
Architected and deployed ETL processes with AWS Glue for seamless data transfer from S3 and Parquet to AWS Redshift.
EDUCATION:
New York Institute of Technology May 2024
Master of Science in Computer Science & Specialization in Cyber Security.
Vaagdevi College of Engineering May 2022
Bachelor’s Computer Science & Engineering.