Resume

Data Engineer Information Systems

Location:

Aurora, IL

Posted:

April 23, 2024

Contact this candidate

Resume:

Devi Madhuri

Contact: 475-***-****

Email: ad47s7@r.postjobfree.com

https://www.linkedin.com/in/devi-madhuri1205/

Data Engineer with 4 years of hands-on experience in Python, AWS and ETL tools, specializing in the development and deployment of robust data infrastructure solutions. Proficient in crafting intricate data pipelines and ensuring data quality, I possess expertise in diverse technologies, including databases, ETL processes, cloud-based data warehouses, and various programming languages. Demonstrated success in delivering impactful data-driven projects through effective collaboration with cross-functional teams.

EDUCATION

Master’s in computer and information systems CGPA: 3.95 New England College, Henniker, NH May 2023

Bachelor’s in Electronics and Communication Engineering CGPA: 3.54 JNTUH College of Engineering, Hyderabad, India May 2020 SKILLS

BigData Technologies: HDFS, Hadoop MapReduce, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie, Storm, Spark, Kafka, HBase, Spark Streaming, Machine Learning.

Web Technologies: HTML, JavaScript, JQuery, CSS.

Cloud Services: AWS, Azure, Terraform

Languages: C, SQL, Python, Shell Scripting, Scala, R, Core Java, JavaScript, Node.js, Golang. Database: SQL Server 2008, MySQL, PostgreSQL, Cassandra, Snowflake, Teradata. Methodologies: Agile, Waterfall model.

API Frameworks: Flask Framework (python)

Operating Systems: Windows 10/8/7/XP, Linux (Ubuntu 18.0), Unix. Reporting Tools: Informatica Power Centre, Tableau, SSIS, SSRS, Power BI. Work History:

Bank of America, NY

Aug2022 - Present

Python Data Engineer

Responsibilities:

• Developed a data platform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements.

• Worked in designing tables in Hive, MySQL using Sqoop and processing data like importing and exporting of databases to the HDFS, involved in processing large datasets of different forms including structured, semi- structured and unstructured data.

• Developed rest API’s using python with flask and framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.

• Worked with Hadoop architecture and the daemons of Hadoop including Name-Node, Data Node, Job Tracker, Task Tracker, and Resource Manager.

• Worked on Python libraries and frameworks such as Flask, Django, and NumPy, enabling rapid development of web applications, data analysis, and scientific computing tasks.

• Configured EC2 instances and configured IAM users and roles and created S3 data pipe using API to load data from internal data sources.

• Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python matplot library.

• Spearheaded the migration of on-premises data sources including MySQL, Oracle, and MongoDB to AWS cloud infrastructure, employing Kafka for messaging and Snowflake for data warehousing.

• Implemented AWS Step Functions to automate SageMaker tasks, orchestrated data pipelines with AWS Glue, and configured IAM security policies for fine-grained access control to AWS S3 and DynamoDB.

• Build Cassandra queries for performing various CRUD operations like create, update, read and delete, also used Bootstrap as a mechanism to manage and organize the html page layout.

• Developed entire frontend and backend modules using Python on Django Web Framework and created User Interface (Ul) using JavaScript, bootstrap, Cassandra with MySQL and HTML5/CSS.

• Designed and implemented ETL workflows using Glue, EMR,, and Step Functions.

• Integrated GitHub with CI/CD pipelines for automated build, test, and deployment processes.

• Created a data pipeline involving various AWS services including S3, Kinesis firehose, kinesis data stream, SNS, SQS, Athena, Snowflake, etc.

• Integrated Five Tran seamlessly into the broader data architecture, optimizing data replication and synchronization.

• Implemented Agile methodology for building an internal applications.

• Collaborated with cross-functional teams to integrate Python-based solutions with existing systems and technologies, ensuring seamless interoperability and data flow across different platforms.

• Conducted Data blending and data Preparation using SQL for Tableau consumption and publishing data sources to the Tableau server.

Technologies Used: Python, Java, JavaScript, PySpark, AWS EC2, S3, EMR, Redshift, Glue, RDS, Lambda, VPC, IAM, CloudTrail, CloudWatch, kinesis data stream, SNS, SQS, Athena, Snowflake, SQL,MySQL, PL/SQL,Oracle, Tableau, Git, Bitbucket, Jenkins, Jira.

Tata Consultancy Services, Hyderabad, India

Aug 2019 - December 2021

Python AWS Developer

Responsibilities:

• Used Pandas, OpenCV, Numpy, Seaborn, Tensorflow, Keras, Matplotlib, Sci-kit-learn, NLTK in Python for developing data pipelines and various machine learning algorithms.

• Utilized a variety of big data analytic tools including Kafka, Pig, Hive, and MapReduce to analyze and process data within the Hadoop cluster, facilitating insightful decision-making.

• Developed Artificial Intelligence Platform which helps Data Scientist's to Train, Test and develop A.I. models on Amazon Sagemaker.

• Perform Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.

• Successfully utilized React for building interactive user interfaces and leveraging Node.js for server-side development, API integrations, and real-time data processing.

• Worked on different data formats such as JSON, XML.

• Integrated AWS CloudTrail with AWS CloudWatch, creating custom alarms and notifications for critical events, allowing for real-time response to potential issues and ensuring the integrity of data processing pipelines.

• Developed MapReduce jobs in Java for data cleaning and processing, implemented Spark applications using Scala and Spark SQL for faster data analysis, and optimized queries for performance.

• Implemented AWS services like EC2, SQS, and SNS, exposing them as RESTful web services, and utilized AWS compute servers extensively for reporting and notification services.

• Drove comprehensive OLTP and OLAP development, optimizing real-time transactional processing and implementing advanced multidimensional modeling and ETL processes for impactful analytical solutions.

• Implemented advanced data wrangling techniques in pandas to handle missing values, outliers, and data inconsistencies, ensuring data quality and integrity for downstream analytics and modeling tasks.

• Demonstrated proficiency in Java and JavaServer Pages (JSP), where I actively contributed to the development and maintenance of web applications. Responsibilities included designing and implementing robust backend functionalities, integrating front-end components, and ensuring optimal performance and scalability of the applications.

• Worked on loading data from UNIX file system to HDFS and analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

• Demonstrated a deep understanding of Agile principles in decision-making processes, fostering a culture of accountability, empowerment, and continuous learning within the data engineering team.

• Collaborated cross-functionally to integrate Python solutions with existing systems.

• Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.

• Deep experience in using DevOps technologies like Jenkins, Docker, Kubernetes etc.

• Enhanced team productivity by introducing and implementing agile methodologies, leading to a 25% acceleration in project delivery timelines.

Technologies Used: Java/J2EE, C#, PL/SQL, Oracle, UNIX, Hadoop, HDFS, MapReduce, AWS EC2, RDS, S3, Cloud Watch, Hive, ETL, Sqoop, Pig, HBase, Apache Spark, JIRA, Oozie Scheduler, Kafka, Git, Python, Django, Flask, Scala, Teradata, Oracle.

Contact this candidate