Data Engineer Senior

Location:

Hyderabad, Telangana, India

Posted:

July 29, 2024

Contact this candidate

Resume:

NEHANTH MOGANTI

Sr. Data Engineer

*********@*****.*** +1-943-***-****

PROFESSIONAL SUMMARY

Proficient Senior Data Engineer with around 9 years of hands-on experience designing, developing, and implementing scalable data pipelines, ETL processes, and data warehousing solutions.

Expertise in Python for data manipulation and analysis using Pandas, NumPy, PyTorch, TensorFlow, Scikit-Learn, and SciPy, applying machine learning algorithms for predictive analytics.

Skilled in Scala for developing robust applications and leveraging Shell Scripting for automation and orchestration of data workflows.

Extensive knowledge of SQL and PL/SQL for managing relational databases, including MySQL, PostgreSQL, SQL Server, Oracle, AWS RDS, Snowflake, and GCP BigQuery, optimizing queries for performance.

Proficient in big data technologies such as Hadoop HDFS, MapReduce, Apache Spark (Spark SQL, Spark Streaming, PySpark), Hive, Pig, Sqoop, Apache NiFi, and Databricks, ensuring efficient data processing and analysis at scale.

Hands-on experience with cloud platforms including AWS (S3, EC2, EMR, Athena, Lambda, Step Functions, Data Pipeline, Redshift, Glue, IAM, CloudFormation), GCP (Dataprep, Dataflow, Pub/Sub), and Azure (Data Factory, Active Directory, CosmosDB, Azure Data Lake Storage), leveraging their services for data storage, processing, and orchestration.

Skilled in containerization and orchestration tools like Docker and Kubernetes, implementing scalable and portable data solutions.

Experienced in version control, continuous integration, and deployment using Git, Jenkins, GitHub, GitLab, Bitbucket, and Azure DevOps, ensuring smooth development workflows and collaboration.

Proficient in data visualization tools such as Tableau, Power BI, and Looker, and utilizing ELK stack (Elasticsearch, Logstash, Kibana), Prometheus, Grafana, and Splunk for monitoring and analytics.

Strong Agile practitioner with experience in Scrum and Kanban methodologies, adept at managing project lifecycles, user stories, and sprints using JIRA, ServiceNow, and Bugzilla.

Demonstrated strong analytical and problem-solving skills, adept at identifying data quality issues and implementing corrective actions.

Excellent communication skills, with a proven track record of effectively presenting complex technical concepts to non-technical stakeholders.

TECHNICAL SKILLS

Programming Languages and Libraries: Python, Pandas, NumPy, PyTorch, TensorFlow, Scikit-Learn, SciPy, Scala, Shell Scripting, SQL, PL/SQL

Cloud Platforms and Services: AWS: S3, EC2, EMR, Athena, Lambda, Step Functions, Data Pipeline, Redshift, Glue, IAM, CloudFront, CloudFormation, Google Cloud Platform (GCP): Dataprep, Dataflow, Pub/Sub, BigQuery, Kubernetes, Terraform, Azure: Data Factory, Active Directory, CosmosDB, Azure Data Lake Storage

Big Data Technologies: Hadoop HDFS, MapReduce, Hive, Pig, Sqoop, Apache NiFi, Databricks, Apache Spark, Spark SQL, Spark Streaming, PySpark

Data Warehousing: Snowflake, AWS RedShift, GCP BigQuery

Data Integration and ETL Tools: Informatica, Talend, Sqoop, Apache NiFi

Database Management: MySQL, MongoDB, PostgreSQL, SQL Server, Oracle, AWS RDS

Containerization and Orchestration: Docker, Kubernetes

Version Control and CI/CD: Git, Jenkins, GitHub, GitLab, Bitbucket, Azure DevOps

Data Visualization and BI Tools: Looker, Tableau, Power BI

Automation and Configuration Management: Ansible, Airflow

Monitoring and Logging: ELK stack, Prometheus, Grafana, Splunk

Security and Authentication: OAuth, AWS IAM, Azure AD

Project Management and Agile Methodologies: JIRA, Agile (Scrum, Kanban), ServiceNow, Bugzilla

PROFESSIONAL EXPERIENCE

Bank of America, Charlotte, NC Sr. Data Engineer Apr 2022 - Present

Responsibilities:

Implemented and managed end-to-end data pipelines using AWS S3, AWS EC2, and AWS EMR for scalable data processing and analytics, achieving 100% reliability in data processing workflows.

Designed and optimized data processing workflows on Hadoop and Spark, utilizing MapReduce and Spark jobs to achieve a 30% improvement in data manipulation efficiency.

Utilized AWS Athena for interactive querying and analysis of data stored in S3, optimizing performance and achieving 50% cost efficiency in data analysis operations.

Implemented serverless architectures using AWS Lambda and orchestrated workflows with AWS Step Functions and Data Pipeline, resulting in a 60% reduction in operational costs.

Managed and optimized data storage and retrieval on HDFS, ensuring scalability and reliability for 1PB of data storage capacity.

Developed and optimized Hive and Pig scripts for data transformation and processing within Hadoop ecosystems.

Implemented data ingestion and integration workflows using Sqoop, ensuring seamless data transfer between Hadoop and relational databases.

Designed and optimized SQL queries for data extraction, transformation, and loading (ETL) processes across various databases, including MySQL and MongoDB.

Developed data processing scripts and workflows using Python, Scala, Pandas, NumPy, and PyTorch for advanced data analytics and machine learning applications.

Implemented and optimized data warehousing solutions on AWS Redshift, ensuring high-performance analytics and reporting capabilities.

Designed and optimized SQL queries for ETL processes across databases like MySQL and MongoDB, achieving a 40% improvement in data extraction efficiency.

Developed data processing scripts using Python, Scala, Pandas, NumPy, and PyTorch for advanced analytics, enabling 90% accuracy in predictive models.

Implemented and optimized data warehousing solutions on AWS Redshift, achieving 99.9% uptime in analytics and reporting capabilities.

Designed and implemented ETL jobs using AWS Glue for data integration, ensuring 95% automation in data transformation workflows.

Managed access controls and permissions using AWS IAM, ensuring data security and compliance with organizational policies.

Automated infrastructure deployments using AWS CloudFormation and Ansible, achieving an 80% reduction in deployment time.

Orchestrated data workflows using Airflow, ensuring 99% reliability in scheduled data processing tasks.

Processed and transformed XML data using XSLT for integration into workflows, achieving 100% accuracy in data formatting.

Containerized data applications and services using Docker and orchestrated container deployments using Kubernetes for scalability and reliability.

Developed and maintained data visualizations and dashboards using Looker, providing actionable insights to stakeholders.

Implemented version control and collaborative development workflows using Git, ensuring code quality and team productivity.

Utilized SAS for statistical analysis, providing 90% accuracy in data exploration and trend analysis.

Managed CI/CD pipelines using Jenkins, achieving 95% automation in testing and deployment processes.

Implemented Agile methodologies (Scrum), delivering 100% on-time project milestones.

Tracked and managed project tasks and workflows using Jira, ensuring alignment with project timelines and deliverables.

Documented technical designs, data architectures, and process workflows to facilitate knowledge sharing and compliance with industry standards.

Environment: AWS S3, AWS EC2, AWS EMR, Hadoop, Spark, AWS Athena, AWS Lambda, HDFS, MapReduce, Hive, Pig, Sqoop, SQL, Python, Scala, AWS Redshift, MySQL, MongoDB, AWS Glue, AWS IAM, AWS CloudFront, Ansible, Airflow, XML, Looker, Git, SAS, Jenkins, Agile, Scrum, Jira.

CVS Health, Scottsdale, AZ Data Engineer Oct 2019- Mar 2022

Responsibilities:

Designed and implemented data processing pipelines on Google Cloud Platform (GCP) using Dataprep, Dataflow, and Pub/Sub for real-time data ingestion and transformation.

Managed and optimized CDH clusters (Cloudera Distribution of Hadoop), including HDFS storage and YARN resource management, for large-scale data processing, supporting 100TB of data storage capacity.

Implemented Kafka for real-time data streaming and message queuing, ensuring 100% reliability in data delivery across systems.

Developed and optimized Apache Spark jobs in Scala for data processing and analytics, handling 10PB of data efficiently.

Utilized NumPy and Pandas libraries for data manipulation and analysis, performing 95% accurate statistical computations and transformations.

Implemented machine learning models using Scikit-Learn and TensorFlow, providing 85% accuracy in predictive analytics.

Integrated OAuth authentication for secure API access and data authorization, ensuring 100% compliance with security policies.

Designed and implemented data integration workflows using Informatica, ensuring seamless data flow and transformation between systems.

Implemented and managed ELK stack (Elasticsearch, Logstash, Kibana) to log, monitor, and analyze data processing workflows and system performance.

Utilized Terraform for infrastructure as code (IaC), automating deployment and management of data engineering environments on GCP, reducing deployment time by 70%.

Developed REST APIs for data access and integration, ensuring 99.9% interoperability and scalability of data services.

Managed and optimized Hive queries and data warehouse operations for 50% faster data storage, retrieval, and analysis.

Leveraged Google BigQuery for querying and analyzing large datasets, achieving a 60% improvement in performance and scalability for analytical workloads.

Implemented Google Analytics for tracking and analyzing data metrics, providing insights for 90% informed business decision-making.

Set up and maintained CI/CD pipelines using Jenkins for automated testing, build, and deployment of data engineering solutions.

Utilized Sqoop for efficient data transfer between Hadoop and relational databases like PostgreSQL, ensuring data consistency and integrity.

Managed project dependencies and builds using Maven, facilitating streamlined development and deployment processes.

Processed and analyzed data stored in JSON formats, ensuring compatibility and usability across various data processing applications.

Developed interactive dashboards and visualizations using Tableau, providing stakeholders with 70% faster actionable insights.

Implemented Agile methodologies to manage and prioritize data engineering tasks and projects, including Kanban boards and sprint planning.

Tracked and managed project workflows and tasks using JIRA, ensuring alignment with project timelines and deliverables.

Documented technical specifications, data mappings, and architectural diagrams to support knowledge sharing and compliance with industry standards.

Environment: Google Cloud Platform (GCP), CDH, Kafka, Apache Spark, NumPy, Pandas, Scala, Scikit-Learn, TensorFlow, OAuth, Informatica, ELK stack, Terraform, REST APIs, Hive, Google BigQuery, Google Analytics, Jenkins, Sqoop, PostgreSQL, Maven, JSON data formats, Tableau, Agile, Kanban, JIRA.

Byteridge Software, India Data Engineer April 2017 - Dec 2018

Responsibilities:

Developed and orchestrated data pipelines using Apache NiFi to ingest, transform, and route data across various systems and platforms.

Automated data processing tasks and system administration using PowerShell scripts, enhancing efficiency and reducing manual intervention.

Implemented data processing workflows on Databricks and Apache Spark for large-scale data analytics and machine learning applications.

Used Python and Pandas and NumPy libraries for data manipulation, analysis, and statistical modeling in data engineering projects.

Designed and optimized MapReduce jobs to process and analyze massive datasets in distributed computing environments like Hadoop.

Monitored and maintained data infrastructure performance using Prometheus and Grafana, ensuring system reliability and optimization.

Managed and administered SQL Server databases, including schema design, query optimization, and performance tuning for data storage and retrieval.

Designed and implemented data integration workflows using Azure Data Factory (ADF), orchestrating data movement and transformation across Azure services.

Implemented version control and CI/CD pipelines using GitHub and Azure DevOps, ensuring robust code management and automated deployment processes.

Managed and optimized data warehousing solutions on Snowflake, ensuring scalability and performance for analytical workloads.

Integrated and managed identity and access controls using Azure Active Directory (AAD), ensuring secure access to Azure resources and data.

Developed interactive reports and dashboards using Power BI, providing actionable insights and visualizations for stakeholders.

Designed and implemented NoSQL database solutions using CosmosDB, optimizing data storage and retrieval for scalable applications.

Implemented and managed Hadoop clusters for distributed storage and processing of large datasets, leveraging HDFS and MapReduce technologies.

Implemented data lake solutions using Azure Data Lake Storage, facilitating scalable storage and analytics of structured and unstructured data.

Utilized TensorFlow to implement and optimize machine learning models within data engineering workflows.

Managed and automated IT service management processes using ServiceNow, ensuring efficient handling of data-related incidents and requests.

Documented technical designs, data architectures, and process workflows to facilitate knowledge sharing and compliance with regulatory standards.

Environment: Apache NiFi, PowerShell, Databricks, Apache Spark, Python, Pandas, NumPy, MapReduce, SQL Server, ADF, GitHub, Azure DevOps for CI/CD, Snowflake, Azure AD, Power BI, CosmosDB, Hadoop, Grafana, TensorFlow, ServiceNow.

Taction Software, India Data Engineer Jan 2015 - March 2017

Responsibilities:

Developed and maintained ETL processes using Apache Sqoop to transfer data between Hadoop and relational databases efficiently.

Implemented data ingestion pipelines using Apache Sqoop and Talend to integrate diverse data sources into Hadoop and AWS S3 for analytics purposes.

Designed and optimized data storage solutions on AWS S3 and AWS RDS, ensuring scalability and performance for big data applications.

Utilized Shell Scripting to automate data workflows and streamline data processing tasks within Hadoop and AWS EC2 environments.

Managed and administered Hadoop ecosystem components, including HDFS, MapReduce, YARN, and Hive for data storage, processing, and querying.

Developed and maintained HiveQL scripts for data transformation and querying large-scale datasets stored in Hadoop.

Implemented and maintained data pipelines using AWS Lambda functions for real-time data processing and event-driven architectures.

Created and optimized SQL and PL/SQL queries for data extraction, transformation, and loading (ETL) processes within Oracle databases.

Implemented version control using Git to track changes in ETL scripts, ensuring code reliability and collaboration across the team.

Monitored data pipelines and infrastructure performance using Splunk, ensuring data integrity and proactively addressing issues.

Collaborated with cross-functional teams to troubleshoot and resolve ETL processes and data pipeline issues.

Designed and implemented scalable data models and schemas in AWS RDS for efficient data storage and retrieval.

Utilized Bugzilla to track and manage data-related issues and enhancements throughout the development lifecycle.

Designed and implemented automated testing frameworks for ETL processes, ensuring data quality and reliability.

Documented technical designs, data mappings, and process flows to facilitate knowledge transfer and compliance with organizational standards.

Environment: Hadoop, Sqoop, Shell Scripting, SQL, PL/SQL, Oracle, Git, AWS S3, AWS Lambda, AWS EC2, Hive, Splunk, Talend, AWS RDS, Bugzilla.

Education:

Master’s in Business analytics - Mercer University, Atlanta, Georgia, USA. 2020

Bachelor’s in Technology - Information Technology - Andhra University, India.

Contact this candidate