Data Engineer

Location:

Carnegie, PA

Posted:

January 06, 2021

Contact this candidate

Resume:

Haarika Koneru

Data Engineer

704-***-****

*******.********@*****.***

PROFESSIONAL SUMMARY:

Around 4 years of experience on designing and implementing complete end to end architecting, designing and developing highly scalable distributed data processing systems and Data Warehouse.

Experience in working with Map Reduce programs using Hadoop for working with Bigdata.

Hands on working experience on AWS cloud services like EMR, EC2 and S3.

Proficient knowledge on handling Bigdata using Hadoop architecture, HDFS, Map Reduce ecosystems like Hive.

Experience in importing and exporting data using Sqoop from Relational database systems to HDFS and vice versa.

Hands on experience in Job/work scheduling tools like Airflow, CloudWatch and Zookeeper.

Having hands-on experience in cloud technologies like Amazon AWS (S3, EC2, Cloud formation, etc.) and Snowflake cloud computing.

Proficiency on Advanced UNIX concepts and working experience on advance scripting/programming using Shell and Python.

Expertise in working with Heterogeneous Source/Target Systems for ETL (Oracle, SQL Server, DB2, Snowflake, Teradata, Flat Files, FTP)

Development & working exposure to software configuration and deployment management tools like SVN, Bitbucket & Jenkins.

Experienced in Agile XP software development model and good exposure to XP techniques like Pair Programming, Test-Driven Development (TDD).

Excellent customer management/resolution, problem solving and debugging skills with good verbal/written communications and presentation skills.

Involved in Unit Testing, Integration Testing, User Acceptance Test (UAT) preparation, and peer reviews.

Willingness to learn new technologies and ability to work in a self-managed and team environment.

TECHNICAL SKILLS:

BIG DATA ECOSYSTEMS

HADOOP, HDFS, MAPREDUCE, HIVE, PIG, SPARK, SCALA

SCRIPTING LANGUAGES

PYTHON, BASH, JAVASCRIPT, XML, HTML.

TECHNOLOGIES

JAVA, HTML/HTML5, CSS2/CSS3, DHTML, XML, XHTML, JAVASCRIPT, AJAX, JQUERY, JSON, XML.

IDE & TOOLS

ECLIPSE, NOTEPAD++, INTELLIJ, PYCHARM, SUBLIME TEXT2, TEXT MATE, MICROSOFT PUBLISHER, VISUAL STUDIO CODE

VERSION CONTROL

PERFORCE, CVS, SVN, GIT, BITBUCKET

METHODOLOGIES

AGILE, WATERFALL

DATABASE

PL/SQL(ORACLE), MSSQL, MONGO DB, CASSANDRA, REDIS, POSTGRESQL, MYSQL, SNOWFLAKE

OPERATING SYSTEMS

WINDOWS 98/2000/XP/VISTA/7/8, MAC OS X, LINUX

PROFESSIONAL EXPERIENCE:

Client: Nike, Inc. April 2018 – Till date

Role: Big Data Developer

Project: Digital Demand Sensing

Responsibilities:

Involved in writing Hive scripts to extract, transform and load data into the database.

Developed Spark application for processing HDFS data and do in memory operations.

Worked on migrating MapReduce programs into Spark transformations using Spark.

Developed the Airflow workflows for daily incremental loads, which gets data from Teradata and then imported to hive tables.

Worked with NoSQL data stores, methods and approaches (star and snowflake, dimensional modeling).

Worked in developing complex stored Procedures Queries.

Defined AWS security groups which acted as virtual firewalls to control the incoming traffic onto one or more AWS EC2 instances.

Created Spark applications to perform various data cleansing, validation, transformation according to the requirement.

Developed complex pyspark scripts to load files from different sources like Teradata, Snowflake, DB2 into Hive tables and also to generated files to be loaded to S3.

Worked on Creating Athena tables on top of S3 files.

Worked on creating python scripts for scheduling tool Airflow for automating the job run process.

Used GitHub and Bitbucket repositories to manage code and is deployed using Jenkins pipeline.

Supported production environment during 24 x 7 on-call rotation.

Project: Integrated Demand and Assortment Planning

Responsibilities:

Worked on creating python scripts to run snowflake queries and generated files to be loaded into S3.

Developed complex functions to load csv files from S3 to Postgres DB.

Experience with developing, tuning and debugging code, stored procedures, functions and packages.

Worked on using AWS Glue crawler on a data source to create schema in the AWS Glue Data Catalog, which is used to Athena to store and retrieve tables.

Developing ETL pipelines in and out of data warehouse using combination of Python and Snow SQL.

Involved in loading data from different sources like (Linux file systems, Teradata and DB2) into Snowflake.

Worked on tuning stored procedures and T-SQL queries to improve performance and sustainability.

Experience in Data Migration from RDBMS to Snowflake cloud data warehouse.

Developed functional, highly performance SQL queries to manage data into and out of Snowflake.

Worked on creating python scripts for scheduling tool Airflow jobs to run scripts to load data into SAS tool.

Worked on Agile scrum team with daily standups and bi-weekly sprint planning sessions.

ENVIRONMENT: PYTHON, SPARK, HADOOP, SNOWFLAKE, HIVE, PIG, SQOOP, HUE, TERADATA, AWS (EMR, EC2, REDSHIFT, S3, GLUE, ATHENA), APACHE AIRFLOW, JENKINS, CIRCLECI, JAVA, GIT, BITBUCKET, JIRA, SLACK, MICROSOFT TEAMS.

Client: Federal Soft Systems Nov 2017 – Mar 2018

Role: Data Engineer

Responsibilities:

Leveraged DevOPs techniques and practices (like continuous integration, test and automation) along with applications deployment (utilizing tools such as maven, GIT, Unix scripting tools, ansible, and docker) to automate.

Developed multiple spark jobs for data cleaning and pre-processing.

Loading data into database and access via different ecosystems.

Implemented partitioning and bucketing concepts in hive to optimize the storage.

Developed a streaming data pipeline using Apache Spark to store data into HDFS.

Developed spark batch jobs to load data into spark RDD and do transformations and actions and store the resultant data in Hive for Data scientists to use.

Migrated NoSQL data to Hive using complex parsing logic.

Worked with Spark Datasets and Spark SQL for faster processing of data.

Improved performance of developed applications by scaling down the memory used on the cluster by the applications.

Worked with testing team to identify and fix bugs and improve the reliability of applications.

Wrote shell scripts to automate spark jobs and to create alerts on the performance of the applications.

Used Amazon Services to run directly SQL queries against exabytes of unstructured in Amazon S3.

Participated in retrospective meetings after every sprint to discuss about the overall ranking of the pervious sprint and to discuss about the drawbacks and scope for development.

ENVIRONMENT: JAVA, SPARK, HIVE, MYSQL, PYSPARK, SHELL SCRIPTING, TABLEAU, AGILE

Client: Tek Services, LLC. March 2017 – Nov 2018

Role: Data Engineer

Responsibilities:

Developed multiple spark jobs for data cleaning and pre-processing.

Loading data into database and access via different ecosystems.

Implemented partitioning and bucketing concepts in hive to optimize the storage.

Developed a streaming data pipeline using Apache Spark to store data into HDFS.

Developed spark batch jobs to load data into spark RDD and do transformations and actions and store the resultant data in Hive for Data scientists to use.

Used Hibernate named queries concept to retrieve data from the database and integrate with Spring MVC to interact with back end persistence system.

Migrated NoSQL data to Hive using complex parsing logic.

Worked with Spark Datasets and Spark SQL for faster processing of data.

Improved performance of developed applications by scaling down the memory used on the cluster by the applications.

Worked with testing team to identify and fix bugs and improve the reliability of applications.

Wrote shell scripts to automate spark jobs and to create alerts on the performance of the applications.

Used Amazon Services to run directly SQL queries against exabytes of unstructured in Amazon S3.

Participated in retrospective meetings after every sprint to discuss about the overall ranking of the pervious sprint and to discuss about the drawbacks and scope for development.

ENVIRONMENT: JAVA, SPARK, HIVE, MYSQL, PYSPARK, SHELL SCRIPTING, TABLEAU, AGILE

EDUCATIONAL QUALIFICATIONS:

Master’s in Computer Science, Kent State University(3.4gpa)

Bachelor’s in Computer Science, JNTUH, India (3.1gpa)

CERTIFICATIONS:

AWS Certified Developer-Associate

Contact this candidate