Data Engineer Quality

Location:

Front Royal, VA

Salary:

85000

Posted:

July 02, 2024

Contact this candidate

Resume:

SUMMARY

• Data Engineer with over * years of experience in designing, implementing, and optimizing data ingestion, processing, and storage solutions across various platforms, including AWS and GCP.

• Proficient in developing scalable data ingestion pipelines using Apache Kafka, AWS Kinesis, and AWS Lambda, processing over 50 million real-time events daily.

• Leveraged tools such as Power BI, Amazon QuickSight, and Python libraries (Pandas, NumPy, Matplotlib) for data analysis, visualization, and providing actionable insights.

• Implemented data quality checks, monitoring, and data profiling frameworks, reducing data quality issues and ensuring data integrity and reliability.

• Expertise in utilizing big data technologies like Apache Spark, Hadoop, Hive, and Sqoop to handle and process large volumes of structured and unstructured data, resulting in significant processing time reductions.

• Built real-time data ingestion pipelines using Apache Kafka and AWS Kinesis, processing millions of events daily, and integrating NoSQL databases like MongoDB and Cassandra for high-throughput data storage.

• Designed and maintained centralized data warehouses on AWS Redshift, enabling self-service analytics for business users and enhancing data-driven decision-making.

• Collaborated effectively with data scientists, analysts, and stakeholders to translate business requirements into technical solutions, demonstrating strong communication and problem-solving skills.

• Effective communicator with excellent problem-solving abilities, experienced in writing test cases, implementing test classes, and managing error handling to ensure software quality and reliability. SKILLS

Methodologies: SDLC, Agile, Waterfall

Programming Language: Python, SQL, MATLAB, R, Scala Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn Visualization Tools: Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP), Amazon Quick sight IDEs: Visual Studio Code, PyCharm, Jupyter Notebook Cloud Platforms: Amazon Web Services (IAM, S3, VPC, EC2, Athena, AWS Glue, Lambda, cloud watch, EMR, Redshift, Sage Maker), Google Cloud Platform

Database: MySQL, Oracle SQL, PL/SQL, PostgreSQL, MongoDB Data Engineering Concept Apache Spark, Apache Hadoop, Apache Kafka, Apache Beam, Apache Airflow, ETL/ELT Other Technical Skills: SSIS, SSRS, SSAS, Docker, Kubernetes, Jenkins, Terraform, Informatica, Talend, Amazon Redshift, Snowflake, Alteryx, Google Big Query, Data Quality and Governance, Machine Learning, Big Data, Grafana, Prometheus, Data Mining, Data Visualization, Data warehousing, Data transformation, Critical Thinking, Communication Skills, Presentation Skills, Problem-Solving

Version Control Tools: Git, GitHub

Operating Systems: Windows, Linux, Mac OS

PROFESSIONAL EXPERIENCE

Data Engineer Wells Fargo, MD Oct 2022 – Present

• Developed a scalable data ingestion pipeline using Apache Kafka, AWS Kinesis, and AWS Lambda to process and load over 50 million real-time events per day from various sources into an AWS Data Lake on Amazon S3.

• Developed and optimized Sqoop jobs to import and store 10 TB of data daily in HDFS and Hive, utilizing data transformation scripts with Pig against unstructured data. Nookesh V

DATA ENGINEER

Maryland, USA 551-***-**** ***.******@*****.***

• Designed and implemented a distributed data processing framework using Apache Spark on AWS EMR to perform batch and real-time analytics on structured and unstructured data, resulting in a 40% reduction in processing time.

• Built and maintained a centralized data warehouse on AWS Redshift, incorporating data from multiple on-premises and cloud sources, enabling self-service analytics for over 100 business users.

• Implemented data quality checks and monitoring using AWS Glue, AWS Lambda, and Amazon CloudWatch to ensure data integrity and reliability, reducing data quality issues by 60%.

• Created robust ETL/ELT processes using Python and Apache Airflow, resulting in a 30% increase in data processing efficiency.

• Integrated MongoDB and Cassandra for real-time and high-throughput data storage requirements, optimizing the performance and scalability of NoSQL databases within the data architecture.

• Collaborated with data scientists and analysts to understand business requirements and translate them into technical solutions, enabling data-driven decision-making across multiple business units.

• Automated data processing workflows using AWS Data Pipeline and AWS Step Functions, reducing manual effort, and ensuring timely delivery of analytical insights.

• Implemented data security and access control measures using AWS Identity and Access Management (IAM), AWS Key Management Service (KMS), and AWS Lake Formation, ensuring compliance with data privacy and security regulations.

• Leveraged Power BI and Amazon Quick Sight for ad-hoc data querying and interactive data visualization, empowering business users with self-service analytics capabilities.

• Implemented monitoring and alerting systems for data pipelines using Grafana and Prometheus, reducing downtime by 20%.

Data Engineer Accenture, India Sept 2019 – Dec 2021

• Architected a real-time fraud detection system using Kafka and PySpark, processing 1 million transactions per hour with 99.9% accuracy. Developed Airflow DAGs to orchestrate daily ETL processes, reducing pipeline failures by 40% and improving data freshness by 6 hours.

• Optimized Hive queries for customer segmentation analysis, decreasing runtime by 35% for tables exceeding 100 million rows, implemented a data lake solution on Google Cloud Platform, migrating 75 TB of historical data from on- premise systems with 99.99% data integrity.

• Engineered a Spark-based data quality framework, automatically validating 500+ data quality rules daily, catching 95% of anomalies before production.

• Utilized PySpark and Python libraries (Pandas, NumPy) to build a churn prediction model, improving accuracy by 18% over legacy systems. Developed SSIS packages for nightly data mart updates, processing 30 GB of incremental data with 99.9% reliability.

• Designed and implemented REST APIs using Python Flask, enabling real-time data access for 50+ downstream applications. Created an Elasticsearch-based search solution for customer service, reducing average query time by 60% and improving user experience.

• Implemented k-means clustering using PySpark to segment 10 million customers, leading to a 25% increase in marketing campaign effectiveness.

• Optimized Big Query performance by implementing partitioning and clustering, reducing query costs by 40% and improving response times by 50%.

• Utilized GCP's Dataflow to create streaming ETL pipelines, ingesting 10,000 events per second from IoT devices with sub-second latency.

• Developed a custom Hadoop MapReduce job to perform complex aggregations on 5 years of transaction data, reducing processing time from 24 hours to 4 hours. EDUCATION

Master of Science in Information Systems – University of Maryland, Baltimore County, Maryland, USA Bachelor of Technology in Electrical and Electronics Engineering – Amrita University, Bangalore, India CERTIFICATION

AWS Certified Solutions Architect - Associate

Contact this candidate