Sai Kiran Reddy Akarapu
*****************@*****.*** +1-872-***-**** USA LinkedIn
Summary
Data Engineer with 4+ years of experience in building and optimizing scalable, high-performance data pipelines. Proficient in SQL, Python, and Scala, with expertise in ETL processes, distributed data processing, and big data technologies such as Apache Spark, Hadoop, and Apache Flink. Good at designing and implementing data governance frameworks, ensuring data integrity, security, and quality. Experienced in integrating machine learning, deep learning, and NLP models to enhance data-driven decision-making. Skilled in data visualization using Tableau and Power BI to translate complex datasets into actionable insights. Technical Skills
Programming Languages: Python, SQL, Scala, Java, R, PL/SQL, Shell Scripting Big Data & Cloud: Apache Spark, Hadoop (HDFS, Hive), AWS (Redshift, S3, Glue), Azure (Data Factory, Synapse Analytics)
ETL & Data Pipelines: Apache Airflow, SSIS, Informatica, DataBricks, Kafka, Flink Databases: PostgreSQL, MySQL, Oracle, MongoDB, Snowflake, BigQuery, NoSQL Data Visualization & Analytics: Tableau, Power BI, Google Analytics, Pandas, NumPy DevOps & Tools: Git, GitHub, Docker, CI/CD, Prometheus, Jira, Agile, Flask, Django Professional Experience
Data Engineer, PTC 10/2023 – Present Remote, USA
Designed and optimized scalable ETL pipelines using Apache Airflow and GitLab CI/CD, improving data processing efficiency by 25% through automated scheduling and workflow orchestration.
Integrated and transformed structured and unstructured data from SQL Server, PostgreSQL, APIs, and cloud storage using Azure Data Factory, Python (Pandas, PySpark), and SQL.
Reduced processing time by 30% by implementing ETL workflows using SSIS, Hadoop, and Spark to handle 1TB+ datasets efficiently.
Collaborated with cross-functional teams to integrate Azure Synapse Analytics and Power BI, delivering real- time insights for improved business decision-making.
Developed monitoring and alerting systems using Azure Monitor, Apache Airflow, and Prometheus, reducing ETL downtime and enhancing data pipeline reliability.
Ensured data integrity and governance by implementing best practices for data modeling, validation, and security using Snowflake and Redshift.
Data Engineer, Sigma InfoTech 11/2019 – 12/2022 Remote, India
Automated financial data workflows using PL/SQL, SSIS, and Apache Airflow, reducing manual data handling by 30% and enhancing operational efficiency.
Designed and deployed interactive financial dashboards using Tableau, integrating Python (Pandas, NumPy) for advanced data modeling and trend forecasting.
Implemented predictive analytics using machine learning (Scikit-learn, TensorFlow) and Monte Carlo simulations, reducing operational costs by 12% and increasing ROI by 15%.
Developed and optimized financial data models using MySQL and Python, improving forecasting accuracy by 25%.
Enhanced data quality and integrity by implementing Python-driven data cleansing, anomaly detection, and validation frameworks, ensuring 99.9% accuracy.
Automated data ingestion and real-time processing using Apache Kafka and AWS Glue, increasing data update efficiency by 40%.
Managed and analyzed 1TB+ of financial data using Amazon Redshift, S3, and AWS Glue for secure storage and warehousing.
Education
Master of Science, University of Missouri Kansas City 01/2023 - 07/2024 USA Computer Science
Bachelor of Technology, Bharath University 06/2018 - 07/2022 India Computer Science