Data Engineer Senior

Location:

Denton, TX

Posted:

September 30, 2024

Contact this candidate

Resume:

Krishna Eluri

Senior Data Engineer Phone no: +1-940-***-**** Email: *********.*****@*****.***

LinkedIn: linkedin.com/in/krishna-eluri/

Professional Summary:

Senior Data Engineer with 9+ years of experience in designing and managing cloud-based data infrastructure using AWS, GCP, and Azure services.

Developed and optimized ETL pipelines using SSIS, Apache NiFi, Talend, and AWS Glue for efficient data extraction, transformation, and loading.

Architected scalable data solutions on AWS S3, Snowflake, BigQuery, and Databricks, supporting large-scale data processing and analytics.

Created and managed interactive dashboards and reports with Power BI, Tableau, and MicroStrategy for actionable business insights.

Designed and optimized databases with PostgreSQL, AWS Redshift, SQL Server, and MongoDB, ensuring high performance and data integrity.

Implemented machine learning models and NLP techniques using Python on AWS for enhanced data-driven insights and analytics.

Utilized Terraform for infrastructure as code, automating cloud resource deployment and ensuring consistent and scalable environments.

Conducted performance tuning on ETL pipelines and database queries to improve data processing speed and efficiency.

Integrated data from diverse sources using Python, SQL, and cloud services to ensure seamless data flow and consistency.

Managed migration of legacy systems to modern cloud environments, ensuring data integrity and compatibility with new technologies.

Developed complex DAX queries and advanced visualizations for Power BI and Tableau to provide deep analytical capabilities and real-time reporting.

Conducted data security and compliance audits, implementing best practices and leveraging Python and Terraform for infrastructure management.

Collaborated with business analysts and clients to translate requirements into technical solutions, enhancing stakeholder satisfaction.

Proficient with tools and technologies including SQL, Python, Terraform, Apache Kafka, Spark, PySpark, and various cloud services.

Applied Agile methodologies to manage and deliver complex data engineering projects, ensuring timely and successful completion.

Technical Skills:

Programming Languages

Python, Java, SQL, Pyspark, Spark SQL

Databases (RDBMS and NoSQL)

PostgreSQL, SQL (T-SQL), PL/SQL, MySQL, Oracle 19c, SQL Server (RDBMS)

Cloud Technologies

Microsoft Azure (Azure Blob Storage, Azure Data Lake Storage (ADLS Gen2), Azure Data Factory, Azure Synapse Analytics, Azure Cosmos DB, Azure Stream Analytics, Azure event hubs, Azure SQL, Azure Databricks) Google Cloud Platform (GCP) (Google cloud storage, BigQuery, Cloud SQL, Dataﬂow, Dataproc, Databricks on GCP)

Data Visualization

Power BI, Tableau, MicroStrategy, SSRS, Crystal Reports

ETL Tools

SSIS, Apache NiFi, Talend, AWS Glue, Apache Kafka, Databricks, Google DataProc, Sqoop, Python, PowerShell, Shell Scripting

Data Storage & Lakes

AWS S3, Azure Blob Storage, Google Cloud Storage, Data Lake, Delta Lake, Parquet, HIVE, HBase, Snowflake

Data Warehousing

AWS Redshift, Snowflake, BigQuery, SQL Server, PostgreSQL, MongoDB, DB2

Performance Tuning

ETL Pipeline Tuning, Database Query Tuning, Hadoop Cluster Optimization

Infrastructure Automation

Terraform, AWS CloudFormation, Kubernetes, Infrastructure as Code (IaC)

Client:- Computershare – College Station, TX Apr 2023 – Till date

Role:- Senior Cloud Data Engineer

Responsibilities:

Collaborated with business analysts to translate reporting requirements into technical specifications, ensuring accurate data delivery.

Gathered requirements for AWS cloud-based infrastructure, focusing on scalability and resource management to support big data analytics and ETL processes.

Analyzed legacy systems for migration to PostgreSQL and AWS environments, ensuring compatibility with big data and Python-based processing.

Designed AWS cloud-based infrastructure using Terraform, managing resources to support big data analytics and ETL processes.

Architected scalable ETL pipelines using Python and AWS services (such as AWS Glue, Lambda, and S3) for efficient data extraction, transformation, and loading.

Developed Data Lake infrastructure on AWS S3, optimized for big data storage and processing, leveraging AWS Glue and AWS Athena for data querying.

Designed and implemented data models in AWS Redshift and Power BI to enable comprehensive data visualization and self-service BI for business units.

Utilized Terraform to automate the creation of AWS resources, ensuring consistent and efficient deployment of big data environments.

Developed ETL pipelines using Python and AWS services, automating data ingestion, transformation, and loading into AWS S3 and Redshift, ensuring data availability for big data analytics.

Integrated Python scripts in AWS Lambda to automate data processing and real-time data transformations within the ETL pipeline.

Leveraged Terraform for infrastructure as code, automating the deployment of AWS resources needed for ETL processes and big data management.

Created Power BI dashboards linked to AWS Redshift and other big data sources, providing real-time insights and comprehensive data visualization.

Developed advanced DAX queries in Power BI for data modeling, enabling deep analysis and reporting on big data.

Built complex T-SQL queries and procedures in AWS Redshift to support efficient financial data processing and analysis.

Implemented machine learning models and NLP techniques using Python on AWS to enhance data-driven insights, particularly in fraud detection and customer analytics.

Automated data preprocessing using Python to clean and prepare raw big data for analysis and reporting in Power BI.

Developed custom Python scripts for ETL processes, ensuring efficient data integration and transformation across multiple AWS services.

Leveraged advanced Tableau skills, integrating Python scripts to process and visualize big data efficiently.

Conducted performance tuning on ETL pipelines in AWS, optimizing Python scripts and SQL queries for faster data processing and reduced latency in big data environments.

Enhanced database performance in AWS Redshift through query optimization and efficient use of big data processing frameworks.

Troubleshooted Terraform configurations to ensure accurate and reliable deployment of AWS resources supporting ETL and big data workflows.

Validated data integrity and accuracy through extensive testing of ETL pipelines, ensuring data consistency across AWS services and Power BI dashboards.

Deployed ETL pipelines on AWS, automating data flows from on-premises and cloud sources into AWS S3 and Redshift, leveraging Python and Terraform for efficient deployment.

Implemented Power BI dashboards connected to AWS Redshift, providing real-time reporting and analysis capabilities on big data.

Automated AWS infrastructure deployment using Terraform, ensuring consistent and scalable environments for big data processing and ETL operations.

Deployed machine learning models and NLP algorithms on AWS, integrating Python scripts for real-time data analysis and insights.

Managed the migration of legacy systems to AWS, ensuring seamless data transition and integration with ETL processes.

Managed and optimized ETL pipelines in AWS using Python and Terraform, ensuring ongoing efficiency and scalability for big data processing.

Monitored and maintained AWS resources, using Terraform to manage infrastructure changes and updates, ensuring optimal performance for ETL and big data operations.

Updated and enhanced Power BI dashboards, continuously integrating new data sources and refining data models to improve reporting accuracy and insights.

Administered AWS Redshift and other cloud databases, ensuring high availability and performance for big data analytics.

Conducted regular audits of AWS environments using Python scripts and Terraform, optimizing resource usage and maintaining compliance with security standards.

Environment: AWS Cloud, AWS S3, AWS Redshift, AWS Glue, AWS Lambda, AWS Athena, AWS CloudFormation, Terraform, Python, SQL (T-SQL), DAX, NLP, IaC, ETL Pipelines, Data Lake, Data Modeling, Power BI, Tableau, SSRS, PostgreSQL, DB2, Data Querying, Machine Learning & AI, NLP, Data Security, Compliance Audits

Client:- Cisco – San Jose, CA Mar 2022 – Mar 2023

Role:- Senior Data Engineer

Responsibilities:

Collaborated with clients to tailor telecom solutions, enhancing retention and satisfaction through optimized data handling on GCP.

Translated business requirements into Tableau dashboards and integrated Power BI solutions into workflows, utilizing GCP’s data services for backend support.

Conducted risk assessments and underwriting for telecom services, utilizing data stored in GCP’s BigQuery for comprehensive analysis.

Ensured data accuracy and integrity across multiple systems using PL/SQL and Python for rigorous validation and cleansing.

Designed and optimized data models using MicroStrategy Architect on GCP, ensuring scalability and performance for large datasets.

Designed and implemented database systems using T-SQL on GCP’s BigQuery and Snowflake, ensuring seamless integration with existing telecom systems.

Created stored procedures, functions, and triggers in PL/SQL to support telecom applications on GCP’s BigQuery and Snowflake.

Designed flexible data models in MongoDB on Linux, facilitating agile management of telecom services and billing operations.

Developed interactive dashboards and reports in Power BI and Tableau for actionable insights into key metrics and trends across telecom data.

Developed SSAS Tabular Models from raw data stored in Snowflake, building comprehensive analytical frameworks for telecom data.

Implemented role-based security models in Power BI and Tableau to protect sensitive information and ensure compliance with industry regulations.

Automated SSAS tabular model cube processing with XMLA queries in SSIS, streamlining data analysis and reporting functions on GCP.

Utilized Power BI for financial modeling, predictive analytics, and customer segmentation, with data sourced from Snowflake and processed using PL/SQL.

Integrated Python with external APIs on GCP to enrich telecom data analysis, enhancing the quality and depth of insights.

Developed custom APIs with GCP API Gateway for seamless communication and data exchange in telecom systems, supporting real-time decision-making.

Automated tasks using GCP CLI and Terraform, reducing errors and improving efficiency in managing telecom infrastructure on GCP.

Implemented data cleansing best practices using Power Query functions and Python scripts to ensure data integrity and consistency across all reports.

Enhanced query performance in MongoDB on Linux, improving user satisfaction and data retrieval speeds.

Streamlined ETL processes using Talend and Python, ensuring data quality and consistency.

Applied Python libraries for in-depth data analysis and visualization on GCP, enabling valuable insights for decision-making in telecom.

Ensured seamless deployment of SSRS reports on GCP, improving data access and decision-making for telecom operations.

Managed successful data migrations from legacy systems to MongoDB on GCP, ensuring smooth transitions and data integrity.

Replaced legacy Crystal Reports with SSRS, integrating them with GCP’s BigQuery for real-time data access and deployment.

Implemented advanced DAX Queries to enhance Power BI's data analysis capabilities and published reports to the Power BI service on GCP.

Administered MongoDB databases on GCP, supporting complex network, customer, and service data within the telecom sector.

Enhanced data storage and retrieval by optimizing databases on GCP using BigQuery, SQL Server, and Snowflake.

Implemented robust monitoring systems on GCP using Stackdriver to ensure reliability and availability of resources for telecom operations.

Worked with DBAs to maintain database integrity and consistency on GCP’s BigQuery and Snowflake, optimizing systems for high performance.

Environment: GCP, BigQuery, Cloud Storage, Cloud Functions, API Gateway, Stackdriver, Snowflake, SQL Server, MongoDB, Linux, SSAS, ETL, Talend, Apache Spark, Python, PL/SQL, T-SQL, Tableau, Power BI, MicroStrategy Architect, SSRS, Crystal Reports, Python, DAX, Power BI, Terraform, GCP CLI, SSIS, XMLA Queries, GCP API Gateway, External APIs

Client :– Omega Healthcare – Boca Raton, FL Jan 2021 – Feb 2022

Role :– Senior Data Engineer

Responsibilities:

Developed efficient ETL pipelines using SSIS for processing complex transformations on fact and dimension tables, integrating with Hadoop clusters and GCP services.

Implemented For Each Loop Containers and custom scripts to automate repetitive tasks, enhancing efficiency in data processing.

Streamlined data movement, transformation, and validation processes to seamlessly integrate diverse data sources using Apache NiFi and Talend.

Leveraged data pipelines with Apache Kafka for real-time data streaming and integration with Databricks for scalable data processing.

Experienced in managing databases across various RDBMS, including Oracle 19c, SQL Server, MySQL, and Snowflake, optimizing performance and scalability.

Configured and managed data artifacts such as data sources, data source views, cubes, dimensions, and hierarchies in Snowflake and Oracle.

Implemented and optimized database solutions using PL/SQL, T-SQL, and Snowflake, ensuring seamless integration and high performance.

Utilized GCP BigQuery and Azure Blob Storage for large-scale data storage and efficient data retrieval.

Created and optimized interactive dashboards and reports using Power BI, Tableau, and MicroStrategy, integrating data from BigQuery, Snowflake, and Azure Data Factory.

Developed and deployed OLAP cubes and SSAS Tabular Models for comprehensive analytical frameworks, utilizing Spark and Databricks for advanced analytics.

Utilized DAX Queries and Power Query functions in Power BI to enhance data analysis capabilities, ensuring compliance and accuracy.

Integrated with GCP Dataflow and Azure HDInsight for real-time data analytics and visualization, providing actionable insights into key metrics.

Designed and implemented ETL processes using Apache NiFi, TalendSQL, and AWS Glue, transferring data from heterogeneous sources and optimizing data workflows.

Developed cloud-based ETL mappings and pipelines in Databricks and Google DataProc, ensuring efficient data processing and integration.

Utilized Sqoop and custom Python scripts for data extraction, transformation, and loading, ensuring data consistency across platforms.

Implemented ETL automation with Terraform and Kubernetes, improving efficiency and scalability in managing data infrastructure.

Collaborated with Sales teams and Customers to enhance forecast accuracy and address inventory-related matters using advanced data analytics.

Maintained and optimized supply chain data with tools like GCP BigQuery and Azure Data Factory for effective stock planning and inventory management.

Monitored and managed Supply Chain KPIs, leveraging Python and Spark Streaming for real-time analytics and reporting.

Developed and managed Power BI data models and Tableau dashboards for property and casualty insurance, ensuring actionable business intelligence.

Orchestrated data migrations from legacy systems to modern PostgreSQL databases and Snowflake, ensuring data integrity and operational continuity.

Analyzed insurance policy data using Python, Scala, and Spark for insights into customer demographics, policy performance, and risk assessment.

Created dynamic dashboards for real-time monitoring of insurance claim trends, leveraging GCP BigQuery and Azure Blob Storage.

Utilized Python, Pandas, and Spark for advanced data analysis and visualization, creating comprehensive reports and dashboards with Power BI and Tableau.

Developed backend programs and data pipelines with Shell Scripting and MapReduce, integrating with HIVE and Druid for large-scale data processing.

Created and optimized SSRS reports and interactive dashboards, ensuring accuracy and efficiency in data presentation and analysis.

Designed and implemented scalable database solutions using AWS Redshift and AWS RDS, integrating with BigQuery and Azure for enhanced data management.

Created and managed ETL pipelines in AWS Glue, ensuring effective data integration and error handling.

Optimized Tableau Server performance and managed large volumes of data using AWS infrastructure and GCP services.

Provided end-user training and support for Business Intelligence tools, including Power BI, Tableau, and Business Objects.

Configured and managed Business Objects deployment, including user setup and automated job scheduling for database backup and maintenance.

Environment: ETL, SSIS, Apache NiFi, Talend, AWS Glue, Databricks, Google DataProc, Sqoop, Oracle 19c, SQL Server, MySQL, Snowflake, GCP BigQuery, Azure Blob Storage, Apache Kafka, Spark, PySpark, Google DataFlow, Azure HDInsight, HIVE, HBase, Druid, Delta Lake, Parquet, MapReduce, Shell Scripting, Power BI, Tableau, MicroStrategy, SSRS, Crystal Reports, GCP, AWS, Kubernetes, Terraform, Stackdriver, AWS Redshift, AWS RDS, Google Cloud Storage, GCP Composer, Python, PL/SQL, T-SQL, Scala, Pandas, DAX, Business Objects, Zookeeper, Hadoop cluster, Digital Signage, Hadoop log files

Client:- XL ENERGY LTD – India May 2015-April 2020

Role:- Data Warehouse Developer / ETL & Reporting Specialist

Responsibilities:

Designed and maintained data warehouse solutions using SSIS for ETL processes and SSRS for reporting. Ensured data accessibility, integrity, and high performance for analytical purposes.

Implemented data quality checks and procedures to maintain high standards of accuracy and reliability in data processing. Utilized Python and SQL to perform data validation and cleansing.

Designed and implemented ETL workflows using SSIS to automate data extraction, transformation, and loading processes. Developed complex SSIS packages to handle diverse data sources and ensure efficient data movement.

Wrote complex queries, stored procedures, and functions in T-SQL for data manipulation, validation, and optimization within SQL Server environments. Leveraged SQL Server as the relational database management system (RDBMS) for managing structured data.

Utilized Python for advanced data transformations, statistical analysis, and integration with external APIs. Enhanced data processing capabilities by employing Python libraries for data manipulation and analysis.

Integrated with external APIs using Python for data fetching and exchange. Employed Python for seamless interaction with various APIs to enrich data analysis and reporting capabilities.

Used Python libraries for data visualization and reporting, facilitating enhanced decision-making processes in energy sector projects. Developed dynamic visualizations to represent complex datasets effectively.

Extracted data from diverse sources including databases, data lakes, and APIs. Employed SQL and Python for data extraction, ensuring comprehensive data integration from various platforms.

Created interactive reports and dashboards with SSRS to provide actionable insights for stakeholders. Used SSRS for detailed data analysis and reporting.

Environment: SSIS, SSRS, Python, SQL, T-SQL, SQL Server (RDBMS), APIs, Python libraries

Client:- LIC India – India Jan 2015 – Apr 2015

Role:- ETL Developer / SQL Developer

Responsibilities:

Gathered requirements and supported strategic planning for Master and Customer Data Management.

Developed and optimized complex SQL queries for report generation.

Created and managed T-SQL components: Stored Procedures, Triggers, Tables, Views, and UDFs.

Utilized Constraints, Indexes, and Isolation Levels for database management.

Built ETL processes with SSIS, including data flow and control flow components.

Collaborated with the Finance team on General Ledger and PeopleSoft files.

Generated policy transaction letters and managed data conversion and load processes.

Automated ETL jobs with Windows Task Scheduler, SQL Server Agent, and UC4.

Applied Agile practices: Pair Programming, Code Reviews, Unit Testing.

Developed and maintained SSRS reports and managed deployment across environments.

Used expressions in SSRS for report creation and handled production code fixes with minimal impact.

Environment: SQL, T-SQL, SSIS, SSRS, Windows Task Scheduler, SQL Server Agent, UC4, Agile practices, Pair Programming, Code Reviews, Unit Testing, General Ledger and PeopleSoft.

Educational Details:-

Masters – University of North Texas - Masters in Business Analytics

Contact this candidate