Data Engineer Big

Location:

Chicago, IL

Salary:

75000

Posted:

October 15, 2025

Contact this candidate

Resume:

Sushma Suda

Data Engineer

Chicago, IL 801-***-**** ************@*****.***

SUMMARY

Data Engineer with 5+ years of experience designing and optimizing scalable data pipelines and architectures. Proficient in Python, SQL, and cloud platforms (AWS, Azure) to support data-driven decision-making. Adept at leveraging ETL processes, big data tools (Spark, Hadoop), and data warehousing (Snowflake, Redshift) to deliver high-performance solutions. Successfully led data migration projects, improving processing efficiency by 30% and reducing costs through automation. Strong problem-solving skills with a focus on data quality and system reliability. SKILLS

Methodologies: SDLC, Agile, Waterfall

Programming Languages: Python, R, SQL, Java

IDEs: PyCharm, Jupyter Notebook, IntelliJ IDEA, Visual Studio Big Data Ecosystem: Hadoop, MapReduce, Hive, Apache Spark, HDFS, Yarn, Data Lake, Data Warehouse Cloud Technologies and Infrastructure: AWS (S3, DynamoDB, Lambda, EC2, Redshift, Glue, Athena, CloudWatch, Aurora, EMR) ETL & Data Processing Tools: AWS Glue, Apache Airflow Visualization Tools: Power BI

Streaming & Analytics Tools: AWS Kinesis, AWS Athena Monitoring & Automation: AWS CloudWatch, AWS Lambda Packages & Libraries: NumPy, Pandas, Matplotlib, SciPy, Seaborn, TensorFlow Development and Operations Tools: Git, GitLab, Jenkins (CI/CD), Docker, Kubernetes, Grafana, Jira, GitHub Databases: MySQL, PostgreSQL, MongoDB, Cassandra, Elasticsearch, RDBMS, NoSQL EDUCATION

Master of Science in Management Information Systems May 2024 University of Illinois at Chicago, Chicago, Illinois EXPERIENCE

ServiceNow, IL May 2024 – Current Data Engineer

• Designed and implemented ETL pipelines using AWS Glue to extract, transform, and load large datasets from S3 into Redshift, improving data processing efficiency by 30% for business analytics.

• Developed data processing scripts in Python using Pandas and NumPy to clean and preprocess raw data, enabling accurate reporting for downstream applications.

• Utilized Hadoop and MapReduce to process unstructured data in HDFS, optimizing batch processing workflows for a customer analytics platform.

• Built and maintained data models in MySQL for a relational database system, ensuring data integrity and supporting real-time querying for business applications.

• Leveraged Power BI to create interactive dashboards, visualizing key performance metrics for stakeholders, which reduced reporting time by 25%.

• Collaborated with cross-functional teams using Jira and Git for version control, following an Agile methodology to deliver iterative improvements in data pipelines.

• Automated monitoring of data workflows using AWS CloudWatch, setting up alerts to detect and resolve pipeline failures, achieving 99.9% uptime for critical processes.

University of Illinois at Chicago (UIC) Oct 2022 – May 2024 Data Engineer (Graduate Assistant)

• Designed and implemented ETL pipelines using AWS Glue to extract, transform, and load large datasets from S3 into Redshift, improving data processing efficiency by 30% for business analytics.

• Designed and implemented data processing workflows using Python and Pandas in Jupyter Notebook to clean and preprocess large datasets for university research projects, reducing data preparation time by 30% for faculty-led studies.

• Utilized MySQL to create and manage relational databases for storing experimental data, ensuring data integrity and optimizing query performance for research analytics.

• Created interactive dashboards using Matplotlib and Seaborn to visualize research findings, enabling professors to present complex datasets to stakeholders during academic conferences.

• Assisted in setting up a small-scale Hadoop cluster with HDFS for a graduate-level course project, processing unstructured datasets to support research on urban data analytics.

Virtual Infotech Solution, India April 2020 - July 2022 Data Engineer

• Architected real-time data streaming pipelines using AWS Kinesis, processing high-velocity data from IoT devices and storing it in DynamoDB for real-time analytics.

• Developed complex data workflows using Apache Airflow, orchestrating ETL jobs across PostgreSQL and Cassandra databases to support scalable data operations.

• Implemented machine learning data preprocessing pipelines using TensorFlow and SciPy in Jupyter Notebook, enabling predictive analytics for customer behavior modeling.

• Optimized query performance in AWS Athena for ad-hoc analysis on Data Lake, reducing query execution time by 40% through partitioning and indexing strategies.

• Deployed containerized data applications using Docker and Kubernetes, ensuring scalable and portable deployments in AWS EC2 environments.

• Integrated Jenkins for CI/CD pipelines, automating data pipeline deployments and reducing release cycles by 20% for production environments.

• Created visualizations using Seaborn and Matplotlib in Python, presenting actionable insights to business teams for strategic decision- making.

• Monitored system performance using Grafana, setting up custom dashboards to track data pipeline metrics and ensure operational efficiency.

Contact this candidate