Data Engineer

Location:

Clawson, MI

Posted:

October 31, 2025

Contact this candidate

Resume:

Alex Shrestha

Data Engineer

Clawson, Michigan **017 989-***-**** *****************@*****.*** LinkedIn PROFESSIONAL SUMMARY

Highly skilled and results-driven Data Engineer with over 5 years of experience designing, building, and operationalizing data lakes, data warehouses, and analytics platforms across GCP, AWS, and Azure. Proven track record in developing scalable ETL/ELT pipelines, real-time streaming solutions, and robust data models to support business intelligence, reporting, and machine learning workflows.

Expert in leveraging modern data engineering tools and frameworks including Apache Airflow, Spark, Kafka, dbt, and Terraform, with hands-on experience in BigQuery, Snowflake, Azure Synapse, and Amazon Redshift. Adept at implementing data governance and cataloging practices using Collibra, Dataplex, and Alation. Strong foundation in cloud-native services, DevOps practices, and CI/CD pipelines, with deep knowledge of SQL, Python, Scala, and Java for data transformation, analysis, and automation. Collaborative team player with experience working in Agile/Scrum environments, delivering high-impact solutions that drive data-driven decision-making and operational efficiency.

TECHNICAL SKILLS

Programming Languages: Python, Java, Scala, SQL

Data Processing Frameworks: Apache Spark, Apache Flink, Apache Beam ETL Tools: Apache Ni Fi, Talend, Informatica, AWS Glue Data Warehousing Solutions: Amazon Redshift, Google Big Query, Snowflake, Apache Hive Big Data Technologies: Hadoop, HDFS, MapReduce

NoSQL Databases: MongoDB, Cassandra, PostgreSQL

Data Pipeline Orchestration: Apache Airflow, Luigi, Prefect Cloud Platforms: AWS, Google Cloud Platform (GCP), Microsoft Azure Data Integration Tools: Apache Kafka, AWS Kinesis

Data Modeling and Design: Star Schema, Snowflake Schema, Normalization and Denormalization, Data Lakes: AWS S3, Azure Data Lake Storage, Google Cloud Storage Data Visualization: Tableau, Power BI, Looker,

Containerization and Orchestration: Docker, Kubernetes PROFESSIONAL EXPERIENCE

Capital One, McLean, Virginia Jan 2024 – Present

Data Engineer

Responsibilities:

• Designed and implemented scalable data ingestion pipelines to support large-scale analytics solutions.

• Automated end-to-end ETL/ELT processes to ingest, transform, and load data efficiently across various datasets.

• Built and managed data warehouses and data lakes on Google Cloud Platform (GCP), ensuring performance, scalability, and cost-effectiveness.

• Developed and maintained data pipelines using Dataflow for real-time and batch processing.

• Utilized Dataproc to run big data workloads and Spark-based transformations in a distributed environment.

• Orchestrated workflows and data pipelines with Cloud Composer and Airflow, ensuring reliable scheduling and monitoring.

• Applied DevOps practices and tools to support CI/CD pipelines and data infrastructure automation.

• Queried and optimized large datasets using BigQuery, focusing on performance tuning and cost optimization.

• Stored and managed structured and unstructured data in Cloud Storage, integrating it with downstream analytics tools.

• Deployed and managed containerized applications using Docker and Kubernetes (GKE) for scalable data processing.

• Built telemetry data pipelines to extract, structure, and route data to appropriate platforms like Kafka and Splunk for monitoring and reporting.

• Designed and maintained MongoDB databases to support high-performance, schema-flexible data storage for semi- structured and unstructured data.

• Used Collibra for data cataloging and governance, ensuring accurate metadata and data lineage.

• Worked with Dataplex to organize and govern data lakes with unified metadata management.

• Maintained enterprise data catalogs and supported governance initiatives using Alation.

• Created reports and dashboards to monitor usage data, supporting billing transparency and SLA tracking. Cedar Gate Technologies, Greenwich, CT May 2021 – Dec 2023 Data Engineer

Responsibilities:

• Designed and implemented an enterprise-grade Data Lake on AWS, supporting diverse use cases including scalable data storage, real-time processing, analytics, and reporting of large and dynamic datasets.

• Extracted data from multiple sources including Amazon S3, Redshift, and RDS, and built centralized metadata repositories using AWS Glue Crawlers and AWS Glue Data Catalog.

• Leveraged AWS Glue Crawlers to classify and catalog data from S3, enabling SQL-based analytics using Amazon Athena.

• Developed and optimized ETL pipelines using AWS Glue to ingest and transform data from external sources (e.g., S3, Parquet, CSV) into Amazon Redshift.

• Authored PySpark scripts within AWS Glue to merge datasets from various tables and automated cataloging with Glue Crawlers for metadata management.

• Implemented monitoring and observability for AWS Glue Jobs and Lambda functions using Amazon CloudWatch with custom metrics, alarms, logs, and automated notifications.

• Migrated on-premises applications to AWS, utilizing EC2 and S3 for data processing and storage, and maintained Hadoop clusters on Amazon EMR.

• Collaborated with business users to gather requirements and translate them into effective Tableau visualizations.

• Engineered real-time data pipelines using Amazon Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics, delivering processed data into S3, DynamoDB, and Redshift.

• Designed and developed scalable data pipelines to ingest, transform, and load data into Snowflake, optimizing warehouse performance and enabling advanced analytics and BI reporting.

• Utilized Python for data analysis, transformation, and reporting, employing libraries like Pandas and NumPy for efficient data manipulation.

• Developed interactive and visually compelling Power BI dashboards to support data-driven decision-making across business units.

• Automated routine AWS infrastructure tasks, such as snapshot management and resource cleanup, using Python scripting.

• Installed and configured Apache Airflow and developed DAGs to orchestrate and automate workflows involving AWS S3 and other cloud-native services.

• Managed highly available production environments across multiple Kubernetes clusters, ensuring scalability and resilience.

• Optimized SQL query performance by analyzing execution plans, indexing, and rewriting inefficient queries.

• Utilized Agile project management tools such as Jira, Rally to manage user stories, track progress, and facilitate clear communication across teams.

• Built Spark applications for data validation, cleansing, transformation, and advanced aggregation, leveraging Spark SQL for in-depth analytics.

• Performed comprehensive data integrity checks using Hive, Hadoop, and Spark.

• Enhanced Hive query performance through partitioning, clustering, and use of optimized storage formats like Parquet. Verisk, Jersey City, NJ Aug 2019 – Apr 2021

Data Engineer

Responsibilities:

• Designed and developed ETL pipelines in Azure Data Factory, automating data ingestion from diverse sources.

• Leveraged Azure Synapse Analytics to create high-performance data warehouses for reporting and analytics.

• Built dbt models to transform raw data into analytics-ready datasets aligned with business logic and reporting needs.

• Utilized Power BI to create visually appealing and insightful dashboards, enabling data-driven decision-making.

• Optimized PostgreSQL database structures and queries, achieving a 35% reduction in query execution time.

• Built real-time data processing solutions in Databricks using Scala, ensuring accurate and timely data delivery.

• Automated infrastructure deployment using Terraform, enabling consistent and efficient environment provisioning.

• Implemented data partitioning strategies in Hive, enhancing performance of large-scale analytical queries.

• Developed batch processing workflows in Java, improving data processing throughput by 25%.

• Worked closely with data scientists and analysts to ensure data pipelines support machine learning workflows and reporting needs.

• Established data quality frameworks, integrating validation checks into data pipelines to ensure accuracy.

• Migrated legacy systems to Azure Synapse, reducing operational overhead and improving scalability.

• Configured and managed Hadoop clusters, ensuring optimal resource utilization and uptime.

• Automated repetitive data processing tasks with Apache Airflow, improving workflow efficiency and reducing manual intervention

• Designed and developed interactive Tableau dashboards and reports to visualize complex datasets, providing actionable insights to business stakeholders and decision-makers.

• Designed data lake architectures to support unstructured and semi-structured data storage.

• Conducted performance tuning for Databricks notebooks, reducing execution times significantly.

• Provided detailed documentation for ETL workflows and processes, enabling knowledge transfer and maintenance.

• Implemented row-level security in Power BI, ensuring data confidentiality across user groups.

• Conducted POCs for new tools and technologies, recommending best-fit solutions for organizational needs.

• Monitored and resolved production issues in data pipelines, ensuring SLA adherence and minimal downtime.

• Ensured compliance with data governance policies by implementing auditing and encryption protocols. EDUCATION

Saginaw Valley State University Saginaw, MI

M.S. in Computer Science & Information Systems

Contact this candidate