Sadhwika Reddy Challa — Senior Data Engineer
913-***-**** *******************@*****.***
PROFESSIONAL SUMMARY:
Highly experienced Data Warehouse Engineer with around 5 years of proven expertise in designing, implementing, and managing robust data warehousing solutions.
Proficient in implementing, configuring, and managing complex Linux-based processes and infrastructure critical for data warehousing operations and scalability.
Strong background in enhancing various Linux-based toolsets, shell scripts, scheduled jobs, and processes to optimize data flow and system performance.
Expertise in developing and optimizing ETL/database load and extract processes, significantly improving data ingestion and transformation efficiency.
Extensive practical experience in Linux environment setup and shell scripting for automation, system administration, and data pipeline orchestration.
In-depth practical knowledge of Unix file systems, including mount types, permissions, standard tools, and powerful piping mechanisms for data manipulation.
Skilled in Python programming for data processing, automation, and ETL development, alongside practical experience with relational databases, especially Oracle.
Proven ability to identify and implement strategic system and architecture improvements, enhancing overall data warehouse reliability and performance.
Committed to Agile methodologies, fostering iterative development and continuous improvement in data engineering and warehouse projects.
Passionate about automation and continual process improvement, consistently seeking innovative solutions to streamline data operations and reduce manual efforts.
Experienced with ETL tools, including Informatica, and orchestration tools like Apache Airflow with Python for complex workflow management.
Demonstrates excellent written and oral communication skills, effectively collaborating with cross-functional teams and stakeholders on data initiatives.
WORK EXPERIENCE:
Senior Data Engineer @ CommonSpirit Health Chicago, IL Aug 2024 – Present
Designed and implemented scalable Linux-based data architecture on AWS for processing sensitive healthcare claims and patient datasets securely.
Developed advanced shell scripts for automating data ingestion, transformation, and Oracle database load processes within the data warehouse environment.
Built robust ETL pipelines using AWS Glue and PySpark to ingest data from on-prem Oracle databases into Amazon S3 data lake, ensuring HIPAA compliance.
Orchestrated complex data pipelines using Apache Airflow with Python, managing dependencies and scheduling for efficient batch and streaming data workflows.
Optimized distributed Spark jobs on EMR for large-scale transformation and aggregation of healthcare data, leveraging Linux system utilities for performance.
Loaded processed datasets into Snowflake, optimizing performance through partitioning and clustering strategies, and managed Unix file system permissions for data security.
Developed serverless APIs using AWS Lambda for downstream data consumers, ensuring seamless access to critical healthcare analytics.
Migrated legacy on-prem data warehouse workloads to AWS cloud, implementing IAM policies and KMS encryption to secure PHI data effectively.
Implemented comprehensive data validation, reconciliation, and quality checks, participating actively in Agile sprint planning and project tracking using JIRA.
Technologies Used: Linux, Shell Scripting, Oracle, Apache Airflow, Python, PySpark, AWS (S3, EMR, Glue, Lambda), Snowflake, Tableau, GitHub, Jenkins
Data Engineer @ Citigroup New York, NY May 2022 – Jun 2023
Engineered and managed Azure-based enterprise data warehouse solutions for financial analytics, integrating robust Linux processes for data operations.
Developed sophisticated shell scripts to automate data extraction, loading, and transformation processes for financial data workflows across platforms.
Utilized Informatica to design and implement complex ETL pipelines, ingesting high-volume financial data from SQL Server into Azure Data Lake Storage.
Optimized Oracle database queries and stored procedures, enhancing the efficiency of financial data extraction and reporting mechanisms significantly.
Implemented dimensional modeling techniques and managed Unix file systems within Azure Synapse Analytics for optimal organization of financial datasets.
Built PySpark jobs using Azure Databricks for processing high-volume financial transaction data, focusing on performance tuning and scalability.
Automated CI/CD pipelines using Jenkins and containerized applications with Docker, ensuring seamless deployment of data warehouse solutions.
Implemented comprehensive security measures, including row-level security and column masking in Synapse, to maintain strict financial compliance.
Built OLAP cubes using Azure Analysis Services and integrated them with Power BI dashboards, actively following Agile Scrum methodology for project delivery.
Technologies Used: Linux, Shell Scripting, Oracle, Informatica, Azure (ADF, ADLS, Synapse, SQL DB), Databricks, Snowflake, Python, Power BI, Jenkins, Docker, GitLab
Junior Data Engineer @ Big Lots Columbus, OH Nov 2019 – Apr 2022
Designed and implemented robust data integration solutions for retail sales and inventory analytics within a Linux server environment.
Developed and enhanced ETL workflows using Informatica, extracting data from MySQL and Oracle databases into the enterprise data warehouse efficiently.
Created shell scripts to automate data ingestion processes, transferring CSV and TXT files into HDFS for large-scale retail analytics.
Performed extensive data transformations using Hive on Hadoop (Cloudera), improving data quality and consistency for sales reporting.
Built Spark batch processing jobs for sales reporting, leveraging Python for complex data manipulation and aggregation tasks effectively.
Developed and optimized complex SQL queries and reporting views, ensuring efficient data retrieval for business intelligence dashboards.
Implemented robust data validation and reconciliation processes to maintain accuracy across various retail datasets and reporting cycles.
Managed data access controls and security policies using Ranger on Hadoop, ensuring compliance with stringent data governance standards.
Collaborated with cross-functional teams on requirements gathering and design phases, contributing to the development of robust data solutions in an Agile setting. Technologies Used: Linux, Shell Scripting, Oracle, Informatica, Hadoop (Cloudera), Hive, Spark, MySQL, Python, Tableau, GitHub
TECHNICAL SKILLS:
Operating Systems: Linux, Unix, Windows
Programming & Scripting: Python, Shell Scripting (Bash, KornShell), SQL
Databases & Data Warehousing: Oracle (Exadata), PostgreSQL, MySQL, Snowflake, Azure Synapse Analytics, Hive, Cassandra, DynamoDB
ETL & Orchestration: Informatica, Apache Airflow, Azure Data Factory, AWS Glue
Cloud Platforms: AWS (S3, EMR, Lambda, Redshift, Athena, IAM, RDS), Azure (ADLS, ADF, Synapse, SQL DB, Event Hub)
Big Data Technologies: Apache Spark (PySpark), Hadoop, Databricks
Version Control & DevOps: Git (GitHub, GitLab), Jenkins, Docker
BI & Visualization: Tableau, Power BI
Methodologies: Agile, Scrum
EDUCATION:
Master of Science in Big Data Analytics and Information Technology @ University of central Missouri