Data Engineer Quality

Location:

Tampa, FL

Salary:

60000

Posted:

October 15, 2025

Contact this candidate

Resume:

SAI PAVAN KOUSHIK AJMIRA

Data Engineer ****************@*****.*** +1-813-***-**** Tampa, Florida LinkedIn PROFESSIONAL SUMMARY

Results-driven Data Engineer with over 4 years of experience designing, building, and optimizing ETL/ELT pipelines, data lakes, and data warehouses for large-scale enterprise environments, including Fortune 500 companies. Skilled in Apache Spark, AWS, Azure, Databricks, Kafka, SQL, Python, and BI tools such as Power BI and Tableau. Proven ability to improve data processing performance, automate workflows, and ensure data quality, governance, and compliance with GDPR, HIPAA, and SOX. Adept at collaborating in Agile/Scrum teams to deliver high-impact analytics solutions that enhance decision-making and operational efficiency. Holds a Master’s in Computer and Information Science and multiple cloud certifications.

PROFESSIONAL EXPERIENCE

MOLINA HEALTHCARE – Remote, USA Data Engineer Oct 2024 – Present

Designed and deployed modular ELT pipelines with dbt to transform raw data into analytics-ready models, improving transparency, reusability, and documentation for downstream users.

Built scalable ingestion and transformation workflows on BigQuery, leveraging partitioning and clustering to reduce query costs by 30% and accelerate reporting for analytics teams.

Migrated legacy ETL processes to Snowflake, implementing efficient warehouse scaling and data-sharing capabilities that improved cross-team collaboration and reduced data silos.

Developed and automated MLOps pipelines for model training, validation, and deployment, ensuring reproducibility, version control, and CI/CD integration across machine learning workflows.

Partnered with cross-functional teams to design real-time data dashboards powered by dbt and BigQuery, reducing reporting latency and enabling near-live decision-making.

Implemented data quality monitoring using dbt tests and CI/CD workflows, catching anomalies early and ensuring confidence in production datasets.

HCL TECHNOLOGIES – Hyderabad, India Associate Data Engineer Jan 2021 – Dec 2022

Developed and maintained 15+ high-volume ETL/ELT pipelines using Apache Spark, AWS Glue, and Databricks. These pipelines processed over 2TB of data daily from APIs, RDBMS, and flat files, creating reliable data lakes and warehouses for analytics teams.

Managed large Hadoop-based big data ecosystems containing over 100M records. By adopting Parquet and ORC formats, storage efficiency improved by 25% and data retrieval became significantly faster.

Deployed solutions across AWS EMR, Redshift, and S3 to support advanced BI and analytics workloads. This ensured data scientists and analysts could run complex models without performance bottlenecks.

Automated 50+ workflows with Apache Airflow and AWS services. The automation reduced manual intervention by 40% and improved SLA adherence by 30%, freeing up engineering capacity for innovation.

Implemented cloud security best practices including IAM roles, KMS encryption, and RBAC policies. Combined with Terraform-based infrastructure automation, this guaranteed 100% compliance with organizational and industry standards.

Designed a real-time streaming architecture with Kafka and AWS Kinesis. This enabled the data science team to deliver insights 40% faster, which directly impacted business outcomes in fraud detection and customer analytics.

Enhanced monitoring by integrating AWS CloudWatch and Datadog, which reduced pipeline downtime incidents by 30% and improved issue resolution times.

Conducted peer code reviews and knowledge-sharing sessions, raising team efficiency and cutting onboarding time for new engineers by 20%.

PERSISTENT SYSTEMS – Pune, India Data Engineer Jun 2020 – Nov 2020

Created a Python/SQL-based data quality framework that achieved 99% accuracy. This eliminated recurring ETL failures and established confidence in downstream reports for business stakeholders.

Modernized legacy batch workflows into Apache Spark pipelines. The migration boosted data throughput by 3x and cut inconsistencies by 40%, enabling smoother and more reliable reporting.

Established data governance practices with Collibra to ensure GDPR compliance. This not only improved stakeholder trust by 30% but also positioned the company as a compliant and responsible data custodian.

Automated 25+ reporting workflows using Python and Airflow. The automation reduced turnaround time from 24 hours to under 2 hours, making critical reports available same-day for executives.

Containerized data services using Docker and deployed them on AWS EC2 and EMR. The approach increased scalability by 50% while reducing operational costs by 20%.

Standardized schema version control for PostgreSQL and MySQL environments. This achieved 100% deployment success rates and eliminated rollout failures across environments.

Introduced CI/CD pipelines for data jobs using Jenkins and Git, which improved deployment reliability and cut delivery cycles by 40%.

Collaborated with data science teams to supply cleansed and structured datasets, accelerating ML model training cycles by 25% and improving prediction accuracy. TECHNICAL SKILLS

Programming & Scripting: Python (Pandas, PySpark, SQLAlchemy), SQL, Shell Scripting, Scala, R, Java (for Spark/Hadoop integration)

Big Data & ETL: Apache Spark (Batch & Streaming), Databricks, Hadoop (HDFS, YARN, MapReduce), Hive, Pig, AWS Glue, dbt (Data Build Tool)

Streaming & Messaging: Apache Kafka, AWS Kinesis, Spark Structured Streaming, Flink (basic exposure)

Data Warehousing & Databases: Amazon Redshift, Snowflake, Google BigQuery, PostgreSQL, MySQL, Oracle, SQL Server

Data Quality & Governance: Great Expectations, Collibra, Data Catalogs, Schema Version Control, Data Lineage, GDPR/CCPA Compliance

Cloud Platforms: AWS (S3, EMR, Lambda, Glue, RDS, IAM, CloudWatch, Athena), Azure Data Factory, Azure Synapse, GCP BigQuery, Terraform (IaC)

Workflow Orchestration: Apache Airflow, AWS Step Functions, Prefect, Luigi

Containers & CI/CD: Docker, Kubernetes, Jenkins, GitHub Actions, GitLab CI/CD, Azure DevOps Pipelines

Visualization & BI: Tableau, Power BI, Looker, QuickSight

Other Tools & Practices: Agile/Scrum, DataOps, Automated Testing, Performance Tuning, Cost Optimization, MLOps (basic exposure to MLflow, SageMaker)

EDUCATION & CERTIFICATIONS

Master’s in Computer and Information Science – Saint Leo University, USA Jan 2023 – Aug 2024

Bachelor’s in Computer Science and Engineering – Geethanjali College of Engineering and Technology, India

(Jun 2017 – May 2021)

Microsoft Azure Data Fundamentals- 2023

Microsoft Fabric Data Engineer Associate

Contact this candidate