Data Analytics Engineer

Location:

Houston, TX, 77054

Salary:

75000

Posted:

October 15, 2025

Contact this candidate

Resume:

AKANKSHA SAMINDLA

DATA ANALYTICS ENGINEER

Houston, TX, 77054 +1-832-***-**** ******************@*****.*** LinkedIn Github SUMMARY

Data Analytics Engineer with 5 years of experience designing and implementing cloud-native ETL, streaming, and analytical data solutions across AWS, Azure, and Databricks environments. Proficient in building large-scale data pipelines using AWS Glue Jobs, PySpark, and Databricks, integrating complex sources to deliver real-time insights and data reliability. Expert in developing robust, automated, and compliant architectures leveraging Lambda, Kinesis, Snowflake, Airflow, and dbt—driving observability, performance optimization, and regulatory compliance (HIPAA, GDPR). Skilled in data modeling, orchestration, and CI/CD automation using Terraform, Docker, and GitHub Actions, ensuring consistency and scalability across environments. Certified AWS Cloud Practitioner and Tableau Desktop Specialist with a strong record of cross-functional collaboration, enabling data-driven decision-making and operational efficiency in healthcare and financial domains. TECHNICAL SKILLS

Programming & Scripting: Python, SQL, R, Shell scripting, Java Data Formats & Modeling: JSON, XML, Avro, Parquet; Snowflake/Star schemas, Medallion architecture Data Eng. & Orchestration: ETL pipelines, Change Data Capture (CDC), Apache Airflow, AWS Glue Jobs, PySpark, Databricks, dbt, Hive, SparkSQL, SSIS, SSRS

Cloud Platforms & Services: AWS (S3, Glue, Lambda, Kinesis, Redshift, EC2, CloudWatch, CloudTrail) Azure (Data Factory, Synapse Analytics, Data Lake Gen2, Blob Storage, Azure Fabric) GCP

(BigQuery, Dataflow)

Data Warehousing & Databases: Snowflake, Amazon Redshift, Azure Synapse, PostgreSQL, MySQL, MongoDB, Oracle Machine Learning & AI Frameworks: Scikit-learn, TensorFlow, PyTorch, XGBoost, LightGBM, Keras, DVC, MLflow ML Techniques: Supervised & unsupervised learning, regression, clustering, time series forecasting, feature engineering, A/B testing, model deployment Visualization & Reporting Tools: Tableau, Power BI, AWS QuickSight, Excel DevOps, Infrastructure & CI/CD: Git, Jenkins, Docker, Kubernetes, GitHub Actions, Terraform, CloudFormation Data Governance & Compliance: HIPAA, GDPR, CCPA, role-based access control (RBAC), data masking, encryption (KMS) EDUCATION

University of Houston USA Aug 2023 – May 2025

Masters in Engineering data science

CMR College of Engineering and Technology India Jul 2017 – May 2021 Bachelor of Technology in Electronics and Instrumentation Engineering PROFESSIONAL EXPERIENCE

University of Houston USA

Data Analytics Engineer May 2024 – Present

Architected scalable end-to-end AWS Glue, Lambda, Kinesis pipelines integrating Apache Spark Streaming and NiFi, demonstrating problem-solving skills to process 10,000+ IoT sensors daily, improving data throughput by 30%.

Developed and orchestrated PySpark-based AWS Glue Jobs for large-scale data transformations and schema validation, optimizing job runtimes by 40% and enabling real-time analytics for IoT workloads.

Automated schema enforcement and version control leveraging dbt and AWS Glue Catalog, showcasing attention to detail and collaboration, saving over 15 manual hours monthly while boosting pipeline reliability by 25%.

Engineered CDC ingestion workflows from PostgreSQL and MongoDB to Snowflake, applying data partitioning and native cloning techniques to reduce refresh cycles by hours, accelerating analytics delivery.

Implemented robust data lineage, monitoring, and auditing via AWS CloudTrail, CloudWatch, and Elasticsearch, ensuring HIPAA and GDPR compliance through effective cross-team communication.

Orchestrated cloud infrastructure provisioning and deployments using Terraform and GitHub Actions, driving operational efficiency by slashing deployment time from days to under four hours.

Administered RBAC and row-level security policies in Snowflake, collaborating closely with Agile teams to safeguard sensitive healthcare data for 50+ users.

Optimized AWS S3 data layouts and Athena federated queries through fine-grained partitioning strategies, reducing query runtimes by 12 minutes and enhancing anomaly detection speed by 40%. Capgemini India

Data Engineer/Analyst Jul 2021 – Jul 2023

Designed and maintained robust Apache Airflow DAGs on AWS to orchestrate sensitive healthcare and finance batch workloads, reducing job failures by 12 weekly incidents and maintaining high data quality standards for regulated environments.

Built and deployed PySpark ETL scripts as AWS Glue Jobs to transform multi-source healthcare and finance data, achieving 35% faster data ingestion and improved schema consistency.

Developed scalable real-time ingestion pipelines for healthcare and financial analytics use cases, leveraging PostgreSQL, APIs, Kafka, and ETL tools to achieve sub-5-second analytics delivery latency and process over 1 million messages daily.

Established automated data quality frameworks with AWS Glue and Lambda, meeting HIPAA and finance compliance requirements, cutting manual error investigations by 20 hours monthly and boosting stakeholder confidence.

Streamlined SQL transformation workflows with dbt and AWS Glue Catalog on regulatory data, shortening code review cycles from 3 days to 1 day and decreasing bugs by 30%.

Tuned PostgreSQL and MS SQL Server on AWS RDS to optimize reporting for banking and healthcare data systems, improving query response time from 8 to 6 seconds and supporting higher throughput during critical peak loads.

Orchestrated deployment of Docker containers and Kubernetes microservices on AWS EKS for highly secure healthcare and finance platforms, collaborating with DevOps teams to scale system capacity for 3x more concurrent users and reduce downtime by 40%.

Automated environment provisioning using Terraform, accelerating secure project setup from 2 days to 12 hours and ensuring 100% configuration consistency in regulated industry projects.

Collaborated cross-functionally in Agile sprints with business analysts and compliance teams to deliver 15+ new features for healthcare and financial applications, quickly resolving bugs and enhancing platform stability.

Implemented centralized monitoring with AWS CloudWatch, Prometheus, and Grafana, providing real-time incident alerts and compliance visibility with detection times under 3 minutes.

Led data governance and RBAC process improvements that enabled audit readiness, achieving zero compliance issues during healthcare and finance regulatory reviews.

KPMG India

Intern Data Engineer Jul 2020 – Jul 2021

Developed fraud detection pipelines with Azure Data Factory and Databricks (PySpark, SQL), applying feature engineering on millions of records, resulting in a 15% uplift in fraud detection accuracy.

Constructed Medallion architecture layers in Azure Data Lake Gen2 with Delta Lake, shaping clean, governed data for audits and analytics with 99% accuracy through collaborative efforts.

Built streaming ingestion pipelines using Azure Event Hubs and Spark Structured Streaming, enabling real-time fraud alerts and reducing risk exposure time by 30%, highlighting responsiveness and problem-solving.

Automated CI/CD deployments of Azure Synapse SQL pools and Power BI dashboards with GitHub Actions, increasing deployment speed by 40% and promoting efficiency in team workflows.

Established HIPAA-compliant monitoring, logging, RBAC, and data masking using Azure Monitor and Log Analytics, protecting over 100,000 sensitive records with attention to security best practices and teamwork. PROJECTS

Missing Child Identification System – Python, OpenCV, TensorFlow, ResNet50

Built a CNN-based model achieving 90%+ accuracy in detecting missing children using 5,000+ processed images.

Enabled real-time image ingestion from CCTV and external uploads with law enforcement integration Allstate Business Insurance Expert Chatbot (ABIE) – NLP, Dialogflow, JSON

Transformed 10,000+ DITA XML topics into optimized JSON for real-time chatbot use.

Implemented tracking dashboards and metadata enrichment workflows for performance enhancement. CERTIFICATION

AWS Cloud Practitioner

Tableau Desktop Specialist

Contact this candidate