SUSHANTH REDDY
Data Engineer
**************@*****.*** 972-***-**** LinkedIn
5+ years of professional experience as a Data Engineer, with a proven track record of designing, implementing, and optimizing Azure and AWS based data pipelines.
Expertise in managing large-scale data processing, integration, and analytics in cloud and on-premises environments.
Proficient in data engineering technologies such as Python, SQL, Apache Spark, Kafka, and Hadoop.
Experienced in designing and maintaining ETL pipelines for ingesting, transforming, and loading data from multiple sources into data warehouses and data lakes.
Adept in data warehousing technologies like Snowflake, Synapse Analytics, and Redshift to enable scalable and high-performance analytics solutions.
Familiar with both batch and real-time data processing paradigms using tools like Azure Stream Analytics and Apache Kafka.
Strong understanding of cloud platforms (Azure, AWS, GCP) and experience in building cloud-native data solutions for optimized scalability.
Skilled in data governance, ensuring data quality, consistency, and security across multiple systems.
Hands-on experience in automating repetitive tasks with Apache Airflow, improving workflow efficiency and reducing manual errors.
Designed and implemented data pipelines to process large volumes of financial, healthcare, and customer data in various industries.
Strong problem-solving abilities in debugging and troubleshooting data-related issues.
Proficient in creating data models, schemas, and designing efficient databases for reporting and analysis.
Focused on delivering high-quality data solutions by working with cross-functional teams such as data scientists, analysts, and business stakeholders.
Built complex data visualizations in tools like Tableau and Power BI to communicate insights effectively.
Expertise in data integration across various systems, including third-party APIs and legacy systems.
Strong grasp of machine learning pipelines for preprocessing data and feeding it to models for predictive analytics.
Collaborated with data architects to define and implement scalable, efficient, and maintainable data architectures.
Dedicated to enhancing data accessibility, ensuring data can be leveraged effectively by end-users for business decision-making.
Adept at working in Agile environments and delivering timely data solutions with an emphasis on continuous improvement.
Excellent communication skills, capable of interacting with technical and non-technical stakeholders to bridge the gap and ensure project success.
Programming Languages: Python, SQL, Java, Scala, Shell Scripting Big Data Processing Frameworks: Apache Spark, Apache Flink, Apache Hadoop, PySpark Data Streaming & Real-Time Processing: Apache Kafka, Apache Storm, Azure Stream Analytics ETL Tools & Data Integration: Apache Airflow, Talend, Informatica, Apache NiFi Cloud Data Platforms: AWS (Redshift, Glue, S3, EC2, Athena, Lambda), Azure (Data Factory, Synapse Analytics, Databricks, Data Lake, DevOps, Key Vault, Logic Apps), Google Cloud (BigQuery, Dataflow) PROFESSIONAL SUMMARY:
TECHNICAL SKILLS:
Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery, Teradata Database Management Systems: MySQL, PostgreSQL, MongoDB, SQL Server Data Lakes & Data Pipelines: Data Lake Architecture, S3-based Data Lakes, Pipeline Automation, ETL Optimization
Data Modeling & Schema Design: Dimensional Modeling, Star Schema, Snowflake Schema, Normalization/De-Normalization
Data Governance & Quality: Data Lineage, Data Validation, Data Quality Checks, Data Security (Encryption, Role-based Access Control)
Version Control & CI/CD: Git, Jenkins, Bitbucket, GitLab, GitHub Containerization & Orchestration: Docker, Kubernetes, Airflow DAGs, Data Pipeline Automation Machine Learning Pipelines: Data Preprocessing for ML Models, Scikit-learn, TensorFlow (data feeding) Batch & Real-Time Data Processing: Batch Jobs, Data Streaming, Event-driven Architectures Data Visualization & Reporting: Tableau, Power BI, SQL Reporting, Jira, Data Analytics, Data Dashboards Operating Systems: Linux, Windows, Ubuntu (Data Pipeline Management) Cleveland Clinic, Cleveland, OH June 2024 - Present Data Engineer
Involved in the design and implementation of Azure based data pipelines to process healthcare data, ensuring compliance with regulatory standards such as HIPAA.
Led the migration of legacy data pipelines to Azure-based solutions, improving system performance and scalability.
Developed and optimized real-time streaming data pipelines using Azure Event Hubs, Azure Stream Analytics, and Apache Spark, improving healthcare data processing efficiency.
Integrated data from various healthcare sources, including patient records, medical devices, and clinical databases.
Collaborated with data scientists to design data models that power advanced analytics and predictive models for patient care.
Automated routine data processing tasks using Data Factory and Azure Logic Apps, significantly reducing manual intervention and error rates.
Designed and implemented a scalable data lake architecture using Azure Data Lake Storage and Azure Data Factory, enabling faster data retrieval and analytics.
Optimized SQL queries and data extraction processes, reducing data retrieval times by 30%.
Developed a robust data governance framework to ensure data quality, integrity, and security in compliance with healthcare standards.
Provided actionable insights into patient care and hospital operations through data- driven visualizations in Power BI.
Worked closely with data architects to refine the overall data infrastructure for scalability and performance.
Built data models in Snowflake to allow for high-performance querying and reporting.
Worked with healthcare teams to design reports and dashboards that track critical metrics like patient satisfaction and operational efficiency.
Conducted regular audits of data pipelines to ensure accuracy and identify potential bottlenecks in the data flow.
Optimized ETL processes, leading to a 40% reduction in processing time for large datasets. PROFESSIONAL EXPERIENCE:
Implemented an automated data validation process using Data Factory and Python scripts, ensuring the consistency and accuracy of incoming data from diverse sources.
Regularly monitored and optimized the performance of data systems utilizing Azure Monitor and Spark UI, identifying bottlenecks, reducing downtime and improving overall system reliability.
Introduced git-based data version control in collaboration with the development team, ensuring safe handling of data changes and pipeline updates.
Contributed to the development of a comprehensive data pipeline monitoring dashboard for real-time system performance tracking.
Assisted the data analytics team in cleaning and transforming unstructured data, enabling more accurate analysis and reporting.
Ally Bank, Chennai, India Dec 2020 - Dec 2022
Data Engineer
Built and maintained ETL pipelines to process large-scale transactional and financial data for reporting and decision-making.
Integrated data from internal systems and third-party APIs into a centralized data warehouse built on Snowflake.
Worked closely with business teams to understand data requirements and deliver actionable insights to support key financial decisions.
Designed and developed automated data pipelines using Apache Airflow, reducing manual intervention by 50%.
Implemented real-time data processing pipelines with Apache Kafka for timely analysis of financial transactions.
Developed SQL-based data models and optimized queries for improved performance on large financial datasets.
Implemented data extraction strategies from legacy systems, transforming the data into more usable formats for business.
Led the migration of financial data infrastructure to Azure, improving scalability and reducing operational costs.
Worked with data scientists to preprocess financial data using Databricks for machine learning models, focusing on fraud detection and risk analysis.
Developed interactive dashboards in Tableau for senior management to visualize key performance indicators and financial health.
Spearheaded the creation of data warehouse architecture on Azure Synapse to centralize financial and customer data.
Automated routine reporting processes, improving the speed and accuracy of financial reporting for executive teams.
Implemented data versioning practices in Azure DevOps to safeguard data consistency and ensure accuracy across different stages of data processing.
Improved data loading times by optimizing the ETL processes and indexing strategies on large financial datasets.
Collaborated with business teams to integrate customer behavior data, enhancing financial product offerings and marketing strategies.
Established data quality checks throughout the ETL process, ensuring accuracy and reducing errors in financial reporting.
Reduced manual reporting efforts by creating self-service BI tools for internal stakeholders to generate reports on-demand.
Led the integration of external financial datasets, improving customer insights and enriching business intelligence.
Introduced data security protocols, implementing RBAC and data encryption in the ETL pipelines to ensure financial data was handled securely, in compliance with industry standards.
Regularly conducted data audits to ensure the integrity and consistency of financial data across systems. Equinix, Chennai, India Jan 2018 - Dec 2020
Data Engineer
Developed ETL pipelines for large-scale data processing, enabling seamless integration of data from various sources across Equinix's global data centers.
Worked with Apache NiFi to ingest and transform operational data, feeding it into a centralized data warehouse for analysis.
Built and maintained data models in Amazon Redshift, optimizing storage and access for performance improvements in analytics.
Collaborated with cross-functional teams to design custom data solutions for networking, customer, and data center analytics.
Automated data processing workflows using Amazon Step Functions and Airflow, improving pipeline efficiency and reducing manual effort.
Designed data architecture solutions for large-scale data integration, ensuring high availability and minimal downtime.
Implemented data partitioning strategies, improving the query performance of large datasets stored in S3 and Redshift.
Worked with the data analytics team to produce business intelligence reports for internal stakeholders, tracking key performance indicators.
Integrated real-time data streams from networking devices to provide operational insights into data center performance.
Optimized ETL processes to handle increased data volumes, resulting in a 25% performance improvement.
Led the migration of legacy data systems to a more cloud-based infrastructure (AWS), leveraging Amazon S3, AWS Glue, and Redshift for improved scalability and flexibility.
Ensured that data from network sensors and devices was cleansed and structured for accurate reporting.
Designed and deployed self-service reporting tools for business users, enabling easier access to operational data.
Developed and maintained data pipelines that fed data into reporting tools used by executive teams.
Regularly monitored and fine-tuned data flow processes using AWS CloudWatch, improving the efficiency of real-time analytics.
Collaborated with IT and network teams to integrate AWS monitoring systems, ensuring data was available in near real-time.
Worked on data security measures to protect sensitive customer and operational data across Equinix's systems.
Developed automated alerts to monitor data quality and pipeline health, reducing downtime.
Established data governance protocols to ensure proper data management and consistency.
Conducted data validation exercises, ensuring the accuracy and reliability of the data before integration into systems.
Master of Science in Information Systems
University of North Texas, May 2024
Microsoft Certified: Azure Data Engineer Associate (DP-203) EDUCATION:
CERTIFICATION: