Achyuth Reddy Rodda
Data Engineer
Birmingham, AL 205-***-**** *****************@*****.*** LinkedIn Summary
Experienced Data Engineer with around 3+ years of expertise in designing and developing scalable data solutions across cloud platforms including Azure and AWS. Proficient in building robust ETL/ELT pipelines using Databricks and Azure Data Factory, supporting both batch and real-time processing with PySpark, Spark SQL, and Kafka. Adept at implementing Medallion Architecture, managing structured and semi-structured data with Delta Lake and Delta Live Tables (DLT). Familiar with data lake storage systems, orchestration tools, and NoSQL databases such as MongoDB, DynamoDB, and HBase. Proven ability in data governance, metadata management, and securing enterprise data using Unity Cata log and cloud-native controls. Strong programming background in Python, SQL, and Java, with experience in API integrations and building insightful dashboards using Power BI. A collaborative team player with excellent communication and problem-solving skills, delivering data-driven solutions aligned with business goals. Experience
Humana, USA Data Engineer Nov 2024 – Current
• Designing and developing ETL pipelines in Azure Databricks, implementing Medallion Architecture for structured data processing.
• Built data ingestion workflows to extract, transform, and load data from source systems into Databricks for analytical processing.
• Automated data workflows using Databricks Workflows and Delta Live Tables (DLT) to improve efficiency and reliability.
• Developed SQL-based ETL queries on source systems to validate output and ensure data integrity before transformation.
• Engineered forecasting models in Databricks to predict future sales, optimizing data-driven decision-making.
• Conducted data validation and testing across multiple layers to ensure accuracy and consistency in analytical outputs.
• Worked with Python libraries like Pandas and NumPy for Data validation.
• Developed KPI-driven reports and dashboards, transforming raw data into actionable insights for business users.
• Developed visualizations and dashboards using Power BI.
• Provided technical support and debugging expertise, resolving pipeline failures and improving system reliability.
• Expertise in data validation, testing, and performance tuning of large-scale data processing workloads.
• Developed Apache Spark applications by using spark for data processing from various streaming sources.
• Worked with multiple data formats, including JSON, CSV and Excel, ensuring seamless ingestion and transformation in ETL pipelines.
• Created Tables, Stored Procedures, and extracted data using SQL for business users whenever required.
• Built and deployed forecasting models in Databricks using PySpark MLlib to predict sales and demand trends.
• Designed machine learning pipelines for classification and regression tasks, improving business insights and planning.
• Applied feature engineering and model validation techniques to enhance accuracy and reliability of predictions.
• Worked in a cross-functional environment, collaborating with business analysts, managers, and leadership teams to align data solutions with business goals.
Environment: Azure (Databricks, Data Lake Storage, Synapse Analytics, Data Factory), Spark, Databricks Workflows, Delta Live Tables (DLT), Medallion Architecture, Apache Spark (PySpark, Spark SQL), Delta Lake, JSON, CSV, Excel, Python (Pandas, NumPy), SQL, Unity Cata log, GitHub, CI/CD Pipelines, Power BI. Cybage Software, India Data Engineer Jun 2020 – Jul 2022
• Utilized Python, SQL, and Scala to extract, transform, and load data from diverse sources into data lakes and data warehouses.
• Established robust data monitoring solutions, reducing system downtime by 25% and ensuring data availability.
• Implemented data encryption and access control measures, ensuring compliance with data security standards and reducing security incidents by 20%.
• Built and maintained data engineering frameworks such as Kafka, Airflow, and Snowflake to support data operations.
• Conducted data analysis and machine learning using libraries like NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, and TensorFlow.
• Implemented ETL solutions using SSIS and SSAS, automating data workflows and enhancing data quality.
• Developed data visualizations and reports using, Power BI, and SSRS to aid data-driven decision-making.
• Managed and optimized databases, including MS SQL Server, PostgreSQL, and MySQL.
• Assisted in data-driven decision-making by providing actionable insights and recommendations.
• Implemented data quality checks and automated data cleansing processes, resulting in a 25% reduction in data errors and inconsistencies.
• Conducted machine learning experiments with Scikit-learn and TensorFlow for anomaly detection and predictive analytics.
• Built data visualization dashboards with predictive insights, enabling proactive decision-making.
• Developed exploratory data analysis (EDA) reports to identify patterns, trends, and data-driven business opportunities.
• Implemented real-time data streaming pipelines using Kafka and Airflow for high-volume processing.
• Engineered ETL solutions with SSIS/SSAS to integrate multiple data sources into Snowflake and SQL Server.
• Automated data quality checks and cleansing processes, reducing errors by 25% and improving reliability. Environment: Python (Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-learn, TensorFlow), SQL, Scala, Apache Kafka, Apache Airflow, Snowflake, SSIS, SSAS, SSRS, Power BI, MS SQL Server, PostgreSQL, MySQL, Data Lakes, Data Warehouses, Git, JSON, CSV, Excel, Data Encryption & Access Control Tools. Skills
• Methodologies: SDLC, Agile, Waterfall
• Programming Languages: Python, SQL, R, Scala, Java
• IDE’s: PyCharm, Jupyter Notebook, Visual Studio Code
• Big Data Ecosystem: Hadoop, MapReduce, Hive, Apache Spark, Pig, HDFS
• ETL Tools: SSIS, SSAS, Azure Data Factory, Databricks Workflows
• Cloud Technologies: AWS, GCP, Azure (Data Lake, Synapse Analytics, Redshift, Big Query)
• Frameworks: Kafka, Snowflake, Docker, Airflow, MLflow
• AI / Machine Learning: Machine Learning (Supervised & Unsupervised), Deep Learning (TensorFlow, Keras, PyTorch), Natural Language Processing (NLTK, Hugging Face), Feature Engineering, Model Optimization, Hyperparameter Tuning, MLOps (Model Deployment on Azure & AWS)
• Packages / Libraries: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow, PyTorch, NLTK, Transformers, Statsmodels
• Reporting Tools: Tableau, Power BI, SSRS
• Data Engineering & Processing: Medallion Architecture, Delta Lake, Delta Live Tables (DLT), Spark Streaming, Real-time Data Pipelines (Kafka, Flink), Data Modeling (Star & Snowflake Schema), Data Governance (Unity Cata log, RBAC, Data Encryption)
• Databases: MS SQL Server, PostgreSQL, MySQL, MongoDB, DynamoDB, HBase
• Other Tools: CMS, Jira, Confluence, Jenkins, Git, MS Office, ERP, CRM
• Soft Skills: Data Cleaning, Data Wrangling, Critical Thinking, Communication Skills, Presentation Skills, Problem-Solving
• Operating Systems: Windows, Linux
Education
Master of Science in Computer Science University of Alabama at Birmingham, Alabama Jan 2023 – Dec 2024 Bachelor of Computer Science Bharath University, Chennai July 2018 – July 2022 Certificate
• Microsoft Certified: Azure Data Engineer Associate
• Google Data Analytics Professional