Jhansi Sreya J — Data Engineer
+1-469-***-**** *********@*****.***
PROFESSIONAL SUMMARY
Seasoned Data Warehouse Engineer with around 5 years of experience in designing, implementing, and optimizing scalable data solutions.
Demonstrated expertise in Shell Scripting for automating complex data extraction, transformation, and loading processes within Linux environments.
Profound hands-on experience with Oracle development, including Exadata, for robust relational database management and performance tuning.
Skilled in implementing and managing Linux-based infrastructure, ensuring high availability and optimal performance for data warehousing operations.
Proficient in Python for developing sophisticated ETL pipelines, data analysis, and system automation scripts, complemented by Perl knowledge.
Experienced in enhancing ETL and database load/extract processes, focusing on efficiency and data integrity across diverse data sources.
Adept at identifying and implementing system and architecture improvements to optimize data warehousing landscapes for future scalability.
Extensive practical working experience with relational databases, specifically Oracle Exadata, ensuring secure and high- performing data storage.
Strong understanding of Unix file systems, including mount types, permissions, and standard tools, for effective system administration.
Proven ability to leverage orchestration tools like Apache Airflow with Python to manage and schedule complex data workflows.
Committed to automation and continual process improvement, consistently enhancing Linux-based toolsets, scripts, and operational jobs.
Collaborative professional with strong Agile methodology experience, contributing to cross-functional teams and ensuring clear communication.
TECHNICAL SKILLS
Programming Languages: Python, Shell Scripting, Perl, SQL, Scala, Java
Operating Systems & Scripting: Linux, Unix File Systems, Standard Unix Tools, Bash, Power Shell
Data Warehousing & Databases: Oracle Exadata, Oracle, Snowflake, Amazon Redshift, PostgreSQL, MySQL, SQL Server, Teradata, DynamoDB
ETL & Orchestration Tools: Apache Airflow, Informatica PowerCenter, AWS Glue, Azure Data Factory, Apache Spark, Databricks
Cloud Platforms: Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure
Version Control & Project Management: Git, GitHub, JIRA, Confluence
Data Processing & Formats: PySpark, Spark SQL, Pandas, Delta Lake, Parquet, JSON, CSV WORK EXPERIENCE
Senior Data Engineer @ Aetna Hartford, CT Sep 2024 – Present
Implemented and configured Linux-based infrastructure to manage complex data warehousing processes for healthcare claims, ensuring optimal performance.
Developed robust Shell scripts for automating data ingestion, transformation, and validation routines across diverse healthcare datasets within a Linux environment.
Designed and optimized Oracle Exadata database schemas for high-performance analytical workloads, supporting critical healthcare data analysis.
Enhanced ETL processes using Informatica PowerCenter for ingesting data from Oracle databases and REST APIs into Amazon S3, ensuring data quality and governance.
Managed and maintained various Linux-based toolsets, scripts, and jobs to streamline data pipeline operations and reduce manual intervention effectively.
Orchestrated complex data workflows using Apache Airflow with Python, scheduling and monitoring all ETL jobs on a Linux server environment.
Identified and implemented system and architecture improvements within the AWS ecosystem, enhancing scalability and efficiency of data warehousing solutions.
Developed sophisticated Python scripts for data preprocessing, feature engineering, and supporting machine learning model deployments on GCP Vertex AI.
Defined and enforced stringent data quality rules, designing comprehensive test cases and performing validation checks across ingested datasets.
Ensured strict data governance and HIPAA compliance by adhering to IAM and encryption guidelines for sensitive healthcare information.
Managed ETL code deployments across development and production environments using Jenkins, maintaining version control with Git for auditability.
Collaborated with cross-functional Agile teams, driving sprint grooming sessions and coordinating deliverables to meet project timelines.
Technologies Used: AWS, Linux, Oracle Exadata, Shell Scripting, Apache Airflow, Python, PySpark, Informatica PowerCenter, Spark SQL, Oracle, DynamoDB, Jenkins, Tableau, Google Looker, GCP Vertex AI, JIRA Data Engineer @ Wells Fargo San Francisco, CA May 2022 – Jul 2023
Managed Linux-based environments for critical financial transaction data processing, ensuring system stability and security.
Developed and optimized PySpark and Scala applications on Databricks clusters for batch processing of large financial datasets, leveraging Linux tools.
Crafted advanced Shell scripts to automate data migration from legacy Oracle databases to Snowflake on AWS S3, improving pipeline efficiency.
Implemented sophisticated transformation logic using Snowflake stored procedures, SnowSQL, and Oracle SQL to prepare reporting layers.
Built robust data ingestion pipelines to load on-premise Oracle records into Snowflake, integrating with Linux file systems for data staging.
Configured and maintained Linux services supporting ETL tools, including integration with Informatica PowerCenter for efficient data transformations.
Partitioned and optimized large Snowflake datasets for superior query performance and reduced operational costs in the financial data warehouse.
Applied row-level security and column masking policies in Snowflake, ensuring strict compliance with financial data regulations and governance.
Supported Jenkins CI/CD deployments and managed code versioning using GitHub, adhering to best practices for data engineering projects.
Wrote reusable Python scripts to automate scheduling and execution of data pipeline tasks, enhancing overall operational efficiency.
Performed thorough data quality checks on Snowflake tables using SQL assertions to validate record counts and data integrity.
Participated actively in Agile sprint ceremonies, documenting requirements in JIRA and Confluence for transparent project tracking.
Technologies Used: Linux, Oracle, Shell Scripting, Informatica PowerCenter, Databricks, PySpark, Scala, Snowflake, Teradata, AWS S3, Delta Lake, Parquet, Python, Jenkins, Tableau, GitHub, JIRA, Confluence Junior Data Engineer @ Walmart Bentonville, AR Nov 2019 – Apr 2022
Assisted senior engineers in configuring and managing Linux servers for Apache Spark and Hive ETL workflows, handling retail data.
Wrote PySpark scripts to extract and load data from Teradata into HDFS staging tables, ensuring efficient data processing on Linux.
Developed and optimized Shell scripts on Linux servers to automate CSV and pipe-delimited file movement into HDFS for ingestion.
Created HiveQL and SQL queries to pull and filter retail sales and inventory datasets for various reporting and analytics teams.
Monitored Apache Airflow DAGs and flagged scheduling issues to the data engineering team, supporting robust data orchestration.
Performed source-to-target reconciliation checks between Teradata and Hive target tables, validating data integrity post-transformation.
Assisted in upholding Unix file system standards, managing permissions and file structures for secure data storage and accessibility.
Reviewed Linux pipeline execution logs and reported recurring failures to the team lead, contributing to operational stability effectively.
Queried Teradata and Oracle tables to extract and deliver datasets for Tableau reporting to business stakeholders, fulfilling data requests efficiently.
Assisted in verifying star schema fact and dimension table loads in Teradata by running null checks, ensuring data model accuracy.
Maintained data lineage entries and metadata records in the internal catalog under team guidance, enhancing data governance practices.
Committed data pipeline code to Git and opened pull requests for team review, adhering to version control best practices for collaboration.
Technologies Used: Linux, Shell Scripting, Apache Airflow, Oracle, Google BigQuery, Apache Spark, PySpark, Hive, Teradata, SQL, Apache Zeppelin, Python, Git, Confluence
EDUCATION
Master of Science in Computer Science @ University of New Haven