Sai Kiran
Atlanta,GA +1-706-***-**** ***************@*****.*** linkedin.com/in/Sai Kiran
SUMMARY
Senior Data Engineer with extensive experience in designing and optimizing data pipelines using Hadoop, Spark, and AWS. Proven track record in enhancing query performance by 50% through SQL optimization and automating workflows with Apache Airflow. Skilled in real-time data processing with Kafka and Spark Streaming, driving faster decision-making. Eager to leverage expertise to deliver scalable data solutions and improve analytics capabilities. TECHNICAL SKILLS
• Data Engineering & Orchestration: Apache Airflow, DBT
• Data Visualization Tools: Power BI, Tableau
• Version Control: CVS, SVN, GITHUB, Bitbucket
• IDES: Eclipse, NetBeans, IntelliJ, Jupyter, PyCharm, R Studio
• Operating Systems: Windows, Unix, Linux
• Cloud Platforms: AWS, Azure
• Databases: Oracle, Microsoft SQL Server, MySQL, DB2, NoSQL, Snowflake, Teradata SQL, RDBMS, MongoDB, Cassandra, HBase, Azure SQL Warehouse, Azure SQL DB, Teradata, PostgreSQL, PL/SQL
• Containerization Tools: Docker, Kubernetes
• Design Tools: UML, Rational Rose, E-R Modeling, Microsoft Visio
• Programming Languages: Python, HiveQL, Scala, SQL, C, Java, Shell Scripting, JavaScript
• Big Data Technologies: Hadoop, MapReduce, PySpark, YARN, HDFS, HBase, Zookeeper, Hive, Hue, Pig, Sqoop, Spark, Oozie, Storm, Flume, Hortonworks Clusters
• Libraries: Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn
• Build Tools: ANT, Maven
PROFESSIONAL EXPERIENCE
Ameren Feb 2024 - Present
Senior Data Engineer
• Led the design and implementation of scalable ETL pipelines, reducing data managing time by 30% through efficient use of Python, Apache Spark, and AWS S3.
• Implemented robust data ingestion pipelines using Airflow for automated data extraction from APIs and cloud storage buckets, simplifying integration and reducing manual effort.
• Directed the migration of legacy systems to cloud-based architectures, which increased system uptime by 25% and improved data accessibility and reliability across the organization.
• Enhanced reproducible infrastructure by collaborating with software engineering and DevOps teams, facilitating consistent deployment and maintenance of data platforms for SaaS offerings.
• Cooperated with data scientists and analysts to design advanced data models, improving predictive analytics accuracy using Python and SQL.
• Programmed daily data pipeline monitoring and reporting tasks using Airflow, reducing manual intervention and improving system uptime.
• Controlled performance tuning of complex SQL queries and Spark jobs, enhancing data query performance by 20% and reducing processing bottlenecks.
• Ensured adherence to best practices in data engineering, improving team efficiency by 25% through streamlined processes and collaboration.
Brookline Bancorp Inc. Aug 2023 - Jan 2024
Big Data Engineer
• Established data pipelines using Apache Spark and Hadoop to process large-scale financial data, improving data processing speed for real-time reporting.
• Executed ETL processes using Apache NiFi and Python to extract, transform, and load data from many sources, reducing data pipeline errors by 25%.
• Maintained data models in Hadoop, AWS Redshift, and PostgreSQL to support decision-making, improving data retrieval times with optimized queries.
• Deployed automated data quality checks, resulting in a 15% reduction in data inconsistencies and improving reporting accuracy for stakeholders.
• Partnered with cross-functional teams to integrate big data solutions with core banking applications, increasing operational efficiency by 20%.
• Led the migration of on-premise data infrastructure to AWS cloud, cutting infrastructure costs while enhancing scalability and security.
• Operated SQL and NoSQL databases to support financial data queries, reducing query processing times.
• Streamlined data visualization processes with Power BI and Tableau, enabling business teams to create reports 40% faster, supporting faster decision-making.
Amway Aug 2021 - Dec 2022
Hadoop Developer
• Optimized Hadoop-based data pipelines, enhancing processing efficiency using Hive, Pig, and Spark to process large-scale datasets.
• Implemented ETL processes with Hadoop MapReduce and Apache Spark, improving data extraction and transformation times by 30% for improved business insights.
• Integrated Hadoop with HDFS to ensure robust storage and scalability, leading to a 40% reduction in data retrieval time.
• Conducted real-time data analytics on customer transactions using Spark Streaming, boosting reporting speed for faster decision-mak- ing.
• Collaborated with teams to design and implement data models on Hadoop, achieving a 35% reduction in query execution time.
• Programmed data ingestion and processing workflows using Apache Nifi, resulting reduction in manual data handling efforts.
• Managed data integrity and quality using Hadoop ecosystem tools, increasing the accuracy of business reports by 15%.
• Leveraged cloud platforms like AWS and Azure with Hadoop to scale infrastructure, reducing operational costs while maintaining high availability.
EDUCATION
Auburn University at Montgomery Aug 2022 - Dec 2023 Master of Science, Branch
ICFAI University Aug 2019 - May 2022
Bachelor of Business Administration