Sushma Suda
Data Engineer
Chicago, IL 801-***-**** ************@*****.***
SUMMARY
Data Engineer with 5+ years of experience designing and optimizing scalable data pipelines and architectures. Proficient in Python, SQL, and cloud platforms (AWS, Azure) to support data-driven decision-making. Adept at leveraging ETL processes, big data tools (Spark, Hadoop), and data warehousing (Snowflake, Redshift) to deliver high-performance solutions. Successfully led data migration projects, improving processing efficiency by 30% and reducing costs through automation. Strong problem-solving skills with a focus on data quality and system reliability. SKILLS
Methodologies: SDLC, Agile, Waterfall
Programming Languages: Python, R, SQL, Java
IDEs: PyCharm, Jupyter Notebook, IntelliJ IDEA, Visual Studio Big Data Ecosystem: Hadoop, MapReduce, Hive, Apache Spark, HDFS, Yarn, Data Lake, Data Warehouse Cloud Technologies and Infrastructure: AWS (S3, DynamoDB, Lambda, EC2, Redshift, Glue, Athena, CloudWatch, Aurora, EMR) ETL & Data Processing Tools: AWS Glue, Apache Airflow Visualization Tools: Power BI
Streaming & Analytics Tools: AWS Kinesis, AWS Athena Monitoring & Automation: AWS CloudWatch, AWS Lambda Packages & Libraries: NumPy, Pandas, Matplotlib, SciPy, Seaborn, TensorFlow Development and Operations Tools: Git, GitLab, Jenkins (CI/CD), Docker, Kubernetes, Grafana, Jira, GitHub Databases: MySQL, PostgreSQL, MongoDB, Cassandra, Elasticsearch, RDBMS, NoSQL EDUCATION
Master of Science in Management Information Systems May 2024 University of Illinois at Chicago, Chicago, Illinois EXPERIENCE
ServiceNow, IL May 2024 – Current Data Engineer
• Designed and implemented ETL pipelines using AWS Glue to extract, transform, and load large datasets from S3 into Redshift, improving data processing efficiency by 30% for business analytics.
• Developed data processing scripts in Python using Pandas and NumPy to clean and preprocess raw data, enabling accurate reporting for downstream applications.
• Utilized Hadoop and MapReduce to process unstructured data in HDFS, optimizing batch processing workflows for a customer analytics platform.
• Built and maintained data models in MySQL for a relational database system, ensuring data integrity and supporting real-time querying for business applications.
• Leveraged Power BI to create interactive dashboards, visualizing key performance metrics for stakeholders, which reduced reporting time by 25%.
• Collaborated with cross-functional teams using Jira and Git for version control, following an Agile methodology to deliver iterative improvements in data pipelines.
• Automated monitoring of data workflows using AWS CloudWatch, setting up alerts to detect and resolve pipeline failures, achieving 99.9% uptime for critical processes.
University of Illinois at Chicago (UIC) Oct 2022 – May 2024 Data Engineer (Graduate Assistant)
• Designed and implemented ETL pipelines using AWS Glue to extract, transform, and load large datasets from S3 into Redshift, improving data processing efficiency by 30% for business analytics.
• Designed and implemented data processing workflows using Python and Pandas in Jupyter Notebook to clean and preprocess large datasets for university research projects, reducing data preparation time by 30% for faculty-led studies.
• Utilized MySQL to create and manage relational databases for storing experimental data, ensuring data integrity and optimizing query performance for research analytics.
• Created interactive dashboards using Matplotlib and Seaborn to visualize research findings, enabling professors to present complex datasets to stakeholders during academic conferences.
• Assisted in setting up a small-scale Hadoop cluster with HDFS for a graduate-level course project, processing unstructured datasets to support research on urban data analytics.
Virtual Infotech Solution, India April 2020 - July 2022 Data Engineer
• Architected real-time data streaming pipelines using AWS Kinesis, processing high-velocity data from IoT devices and storing it in DynamoDB for real-time analytics.
• Developed complex data workflows using Apache Airflow, orchestrating ETL jobs across PostgreSQL and Cassandra databases to support scalable data operations.
• Implemented machine learning data preprocessing pipelines using TensorFlow and SciPy in Jupyter Notebook, enabling predictive analytics for customer behavior modeling.
• Optimized query performance in AWS Athena for ad-hoc analysis on Data Lake, reducing query execution time by 40% through partitioning and indexing strategies.
• Deployed containerized data applications using Docker and Kubernetes, ensuring scalable and portable deployments in AWS EC2 environments.
• Integrated Jenkins for CI/CD pipelines, automating data pipeline deployments and reducing release cycles by 20% for production environments.
• Created visualizations using Seaborn and Matplotlib in Python, presenting actionable insights to business teams for strategic decision- making.
• Monitored system performance using Grafana, setting up custom dashboards to track data pipeline metrics and ensure operational efficiency.