Resume

Data Engineer Quality

Location:

Cleveland, OH

Posted:

January 18, 2024

Contact this candidate

Resume:

Navya Kakumani 216-***-**** ad2vy2@r.postjobfree.com www.linkedin.com/in/navya8345

Highly skilled Data Engineer with 3 years of experience and a strong background in designing and implementing robust data infrastructure solutions. Experienced in gathering, processing, and analyzing large volumes of structured and unstructured data to drive meaningful insights and support strategic business decisions. Proficient in developing scalable ETL (Extract, Transform, Load) pipelines, data warehousing, and data modeling techniques.

SKILLS

Languages: Python (Pandas, NumPy, Scikit-learn, Matplotlib), R (Ggplot2, Shiny), Scala, SQL, C++, Java.

Databases: RDBMS (Oracle, MySQL, IBM DB2, PostgreSQL), NoSQL databases (MongoDB, Apache Cassandra), AWS DynamoDB, AWS Redshift, Azure SQL Database

Big Data Technologies: Apache Spark, Hadoop, SSIS, ETL, Kafka, PySpark, Snowflake, Airflow, MapReduce.

Cloud Platform & Tools: IBM Cloud, Databricks, Amazon Web Services (Kinesis, S3, Lambda, EC2, Redshift), Microsoft Azure (Data Factory, Cosmo DB, HD Insight, Databricks, Data Lake, Synapse).

Data Visualization and Analysis Tools: Tableau, Power BI, Advanced Excel (leveraging VLOOKUP, INDEX and MATCH functions, and creating complex nested Excel functions, Power Query, PowerPivot, and DAX for advanced data processing.), AWS Redshift, Data Warehouse Management, SQL for Data Warehousing, Data Warehouse Architecture, Data Modeling, ETL, AWS Kinesis, AWS Lambda, EC2, Real-time Data Ingestion, Stream Processing, Event-driven Architectures, Real-time Analytics, Jira, Docker, Jenkins, Kubernetes.

EXPERIENCE

Veeva System Inc, (Remote) USA June 2023 - Present Data Engineer

•Employed AWS Glue for ETL operations and AWS Lambda for running custom data quality scripts to develop comprehensive data quality checks and automated validation processes. This improved data accuracy and reliability by identifying inconsistencies, missing values, and data integrity issues before the data was loaded into the analytics platform.

•Developed and maintained ETL processes using Hadoop tools like HDFS, MapReduce, and Hive, with a focus on optimizing job execution times.

•Used AWS CloudWatch and AWS X-Ray were implemented to proactively identify pipeline issues, track data quality metrics, and provide timely alerts.

•Used Apache Kafka and Apache NiFi, enabling real-time data streaming and ensuring the timely availability of data in the analytics platform.

•Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines.

•Developed Spark programs with Scala and applied principles of functional programming to process the complex unstructured and structured data sets. Processed the data with Spark from Hadoop Distributed File System.

•Developed Spark jobs on Databricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per the use cases.

•Utilized expertise in Excel to clean, cross-reference, and analyze multiple data sources, providing valuable insights.

•Implemented a Continuous Delivery pipeline with Docker, Git Hub and AWS.

•Participated in the full software development lifecycle with requirements, solution designed development, QA implementation, and product support Scrum and other Agile methodologies Collaborate with team members and stakeholders in design and development of data environment.

•In-depth understanding of AWS services like S3, EC2, IAM, RDS, and experience with orchestration and data pipeline tools such as AWS Step Functions, Data Pipeline, and Glue.

GSS INFORMATICS PRIVATE LIMITED,India Aug 2019– Jul 2021

Data Engineer

•Conducted comprehensive data analysis using Azure SQL Database and Python, devising, and executing highly optimized queries for data extraction, analysis, and validation on large data sets.

•Developed Spark applications using PySpark and Spark-SQL in Azure data bricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data.

•Extracted, transformed, and loaded data from source systems to Azure Data Storage services using Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Processed data within Azure Databricks.

•Designed, implemented, and maintained scalable and resilient data pipelines using Azure Data Factory, PySpark, and Airflow. Optimized data ingestion and transformation processes across the enterprise, handling terabytes of data, leading to a 40% reduction in turnaround time.

•Developed interactive Power BI dashboards to closely monitor the key performance metrics of Data Lake and Data Platform usage. These visualization strategies significantly reduced platform failures and resource consumption by 30%.

•Managed codebase using Azure DevOps, Bash, Python, and GitHub, with a focus on maintaining high coding standards and secure operations. Streamlined the Continuous Integration/Continuous Delivery (CI/CD) process by integrating security tools like Azure Security Center and Azure Advisor into the GitHub Pull Request workflow.

•Maintained meticulous attention to detail, resulting in high-quality work and thorough quality checks.

•Employed analytical expertise to solve novel problems, actively seeking solutions through innovative thinking.

•Demonstrated expertise in utilizing Azure Monitor for consistent pipeline monitoring and Azure Notification Hubs for setting up alerts and sending notifications.

•Extensive experience in working with Azure Synapse Analytics and loading data into Synapse Analytics dimension and fact tables via Azure Data Factory.

•Leverage Azure DevOps, Git, and CI/CD pipelines to automate the deployment of data engineering code changes, reducing manual intervention and enhancing the speed and accuracy of deployments.

•Adopted Scrum and Kanban Agile methodologies for project management to ensure timely completion of deliverables and adherence to customer requirements.

CERTIFICATIONS

Microsoft Certified: Azure Data Engineer Associate

Certified in Internet of things using Arduino and Raspberry Pi platforms an online course authorized by University of California, Irvine and offered through Coursera.

Certified in Gaming Development workshop on Build box from APSSDC.

Attended workshop on Progressive Web Apps, Ethical Hacking And Cyber Security

EDUCATION

University of North Texas, Denton TX May 2023

Master of Computer Science, GPA: 3.2/4.0

Courses: Data Analytics, Big Data & Data science, Machine Learning, Computer Algorithms, AI.

Jawaharlal Nehru Technological University, India Aug 2020

Bachelor of Technology in Computer Science and Engineering, GPA: 3.2/4.0

Contact this candidate