Summary
Kranthi Kasargadda
Data Engineer
Texas *******.**********@*****.*** +1-832-***-****
Data Engineer with around 3 years of experience in data extraction, wrangling, statistical modeling, and visualization, skilled at turning raw data into actionable insights for informed decision-making.
Proficiency in SDLC, Agile, and Waterfall methodologies, with Python, R, and SQL expertise for building robust data solutions and workflows.
Expertise in the Big Data ecosystem, including Hadoop, MapReduce, Apache Spark, Hive, and Pig, enabling efficient processing and analysis of large datasets across distributed environments.
Skilled in various ETL tools like SSIS, Apache NiFi, Apache Kafka, and Talend, facilitating seamless data integration and transformation from diverse sources, ensuring data integrity and availability across pipelines.
Experienced designing scalable AWS data architectures and creating impactful dashboards with Tableau, Power BI, and SSRS to drive data-driven decision-making.
Well-versed in managing MongoDB and MySQL databases, with expertise in schema design, query optimization, and maintaining high performance for structured and unstructured data. SKILLS
Methodologies: SDLC, Agile, Waterfall
Programming Language: Python, R, SQL
IDE’s: PyCharm, Jupyter Notebook
Big Data Ecosystem: Hadoop, MapReduce, Hive, Pig,
Apache Spark
ETL Tools: SSIS, Apache NiFi, Apache Kafka, Talend Cloud Technologies: AWS, Azure, GCP, DataBricks
EXPERIENCE
Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow
Reporting Tools: Tableau, Power BI, SSRS
Database: MongoDB, MySQL, SQL Server, PostgreSQL
Other Tools: Git, GitHub, GitLab
Operating Systems: Windows, Linux, Mac
Tenet Healthcare, USA Dec 2023 – Current
Data Engineer
Collaborated within Agile and Software Development Life Cycle (SDLC) frameworks to deliver timely data solutions, improving project execution and collaboration.
Conceptualized Python, SQL, and Big Data tools, including Hadoop, MapReduce, and Hive, to process and analyze large datasets, resulting in a 30% reduction in query execution time.
Built model pipelines to orchestrate data flow using PySpark, Kafka, Hive, and Cosmos DB in Azure Databricks.
Industrialized ETL pipelines using SSIS and Apache NiFi, improving data integration and transformation efficiency across multiple sources.
Implemented Kubernetes clusters on Azure Kubernetes Service (AKS), enabling seamless integration with CI/CD pipelines for automated deployments and efficient resource management.
Handle large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark with Scala and Python, Effective & efficient Joins, and Transform ations.
Leveraged NumPy, Pandas, Matplotlib, and SciPy for data analysis and visualization, enabling more accurate and data- driven decision-making processes.
Established Tableau dashboards and optimized SQL Server and PostgreSQL databases, enhancing real-time decision- making and reducing query processing time by 25%.
Realized version control best practices using GitHub for managing and deploying data projects efficiently across teams. Genpact, India Jan 2021 - Jul 2022
Data Engineer
Applied Waterfall methodology to deliver structured project execution and meet key milestones promptly.
Advanced and optimized data pipelines using Pig and Apache Spark, streamlining large-scale data processing for complex business needs.
Developed and maintained ETL pipelines using tools like Apache Kafka, Talend, and Spark, ensuring seamless data flow and reducing processing time by 20%.
Designed and deployed scalable, cloud-based data architectures on AWS and GCP, enhancing system performance and reliability.
Created jobs using Cloud Composer (Airflow DAG) to migrate data from Google Cloud storage to transform it using DataProc and ingest it in Big Query for further analysis.
Leveraged Python, Seaborn, and TensorFlow for advanced data analysis and machine learning model development, delivering more accurate predictive insights.
Created dashboards and reports using Power BI and SSRS, providing stakeholders with clear, legal data visualizations.
Managed MongoDB databases, ensuring efficient data storage and retrieval for structured and unstructured data.
Applied Git for version control, ensuring smooth collaboration and efficient code management across multiple teams and projects.
EDUCATION
Masters in Management Information Systems Oct 2023 Lamar University Beaumont, TX
Bachelor of Engineering in Electronics and Communication Engineering May 2022 R.V.R. & J.C.College of Engineering Andhra Pradesh, India