Yasin Shaik
TX, USA +1-262-***-**** **************@*****.*** LinkedIn
SUMMARY
4+ years of experience designing scalable data pipelines, streamlining ETL workflows, and deploying cloud-native architectures across AWS, Azure, and GCP. Proficient in Airflow, Spark, dbt, and CI/CD practices to ensure fast, reliable data delivery. Skilled in Python, SQL, and BI tools to drive actionable insights. Industry exposure spans healthcare and finance, with a strong focus on performance tuning, cross-platform orchestration, and maintaining high data quality in production environments.
TECHNICAL SKILLS
Programming Languages & Scripting: Python, SQL, Java, R, Bash Data Analysis & Visualization: Pandas, NumPy, Matplotlib, Seaborn, Power BI, Tableau, Excel Data Engineering & ETL: Apache Spark, Apache Airflow, Informatica PowerCenter, Talend, dbt (Data Build Tool) Database Systems: PostgreSQL, MySQL, Microsoft SQL Server, MongoDB, Oracle Cloud Platforms: AWS (S3, Redshift, Lambda, Glue), Azure (Data Factory, Synapse Analytics), GCP (BigQuery, Cloud Storage) Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery Big Data Ecosystem: Hadoop, Hive, HDFS, Kafka
Machine Learning & Statistical Tools: Scikit-learn, StatsModels, SciPy Version Control & DevOps: Git, GitHub, GitLab, Jenkins, Docker Workflow Automation & Orchestration: Apache Airflow, Luigi, Cron CI/CD & Deployment: AWS CodePipeline, Azure DevOps, GitHub Actions API & Integration: REST APIs, Postman, JSON, XML
Testing & Validation: Pytest, Great Expectations, Data Quality Frameworks PROFESSIONAL EXPERIENCE
Verizon, USA Jan 2025 – Current
Data Engineer I
• Orchestrated high-volume ETL workflows using Apache Airflow and Spark, accelerating ingestion from AWS S3 to Redshift, reducing job delays by 20% during nightly batch processing.
• Configured dynamic transformation layers with dbt and advanced SQL, decreasing stale data in dashboards by 30% and improving daily query consistency across operational reporting.
• Integrated cloud tools across Azure Data Factory, GCP BigQuery, and AWS Glue, enhancing cross-platform coordination and decreasing processing lag by 15% in distributed pipelines.
• Executed automated tests using Pytest, Great Expectations, and Bash, catching data drift early and elevating production accuracy to 98%, reducing customer-impacting issues.
• Launched CI/CD pipelines via Docker, GitHub Actions, and AWS CodePipeline, speeding deployment turnaround by 40% and minimizing rollback events in version-controlled environments.
Pfizer, India May 2022 - Jul 2023
Data Analyst (Engineer)
• Aggregated clinical data using Informatica PowerCenter and Talend, minimizing manual touchpoints by 25%, while harmonizing records across SQL Server, PostgreSQL, and Oracle systems.
• Utilized Python, Pandas, and NumPy to engineer processing logic, improving reproducibility by 30% in trial analytics pipelines shared across research divisions.
• Assembled live dashboards in Power BI, combining Excel macros and DAX, cutting compliance reporting timeframes by 40% and improving regulator audit preparedness.
• Extracted campaign metrics using Postman and JSON APIs, implementing automated validation for outreach data and lifting real-time visibility by 22% across therapeutic programs.
• Conducted trial diagnostics using StatsModels, SciPy, and Scikit-learn, uncovering statistical anomalies and enabling informed trial modifications across 4 drug portfolios.
Citi Group, India Jan 2021 – May 2022
Jr. Data Analyst
• Automated nightly data loads from MongoDB into MySQL using Apache Airflow and Luigi, improving process stability by 15% and enabling consistent delivery of daily financial reports.
• Crafted visual reports using Tableau, Matplotlib, and Seaborn, shortening stakeholder review cycles by 35% and enriching monthly KPIs with clear graphical narratives for executive dashboards.
• Scheduled validation scripts through Bash and Cron jobs, embedding Python routines for automated field checks, reducing manual auditing workload by over 8 hours per analyst each week.
• Configured streaming ingestion from Kafka into HDFS and Hive, enabling faster fraud detection and increasing transaction-level alert precision by 18% across daily operations.
• Controlled version integrity using Git, GitLab, and Jenkins, minimizing merge conflicts by 25% during collaborative syncs and enhancing deployment accuracy across staging and production environments. EDUCATION
Master of Science Management in Data Analycs May 2025 Indiana Wesleyan University, Indiana, USA
Bachelors in Mechanical engineering Apr 2022
Vasireddy Venkatadri Institute of Technology