Post Job Free
Sign in

Data Python

Location:
New Castle, DE
Posted:
August 06, 2020

Contact this candidate

Resume:

Anusha Karri

Email: ***********.***@*****.*** https://www.linkedin.com/in/anusha-k-b4144864/ Ph# 302-***-**** PROFESSIONAL SUMMARY

An IT professional with 5+ years of software development experience looking for an opportunity where I can learn and contribute my skills for organization growth.

Extensive experience as Hadoop Developer and Big Data Analyst with primary technical skills in HDFS, YARN, Pig, Hive, Sqoop, Spark, Oozie, Hcatalog.

Well versed with Big data on AWS cloud service i.e. AWS S3, EMR, Lambda, CloudFormation, Cloud-Watch, Service Catalog,Athena.

Experienced in design, development, deploying and supporting large scale distributed systems.

Experience in handling data from different sources such as Mainframe (Ebcdic and Ascii), Flat Files, Sequence Files and different relational database systems.

Knowledge of Pig and Hive’s analytical functions, extending Hive and Pig core functionality by writing custom UDFs.

Experience in importing and exporting Tera bytes of data between HDFS and Relational Database Systems (Teradata, SQL Server, Oracle) using Sqoop.

Experience in creating the OOZIE workflows.

Hands on experience in developing applications using SPARK.

Experience in creating RDDs, Data Frames and worked with Spark SQL.

Knowledge on Map Reduce jobs.

Develop python, shell scripting and spark-based applications using PyCharm and Anaconda integrated development environments Experience in working with different file formats like Text, Avro and Parquet.

Developed Python Scripts for automation & monitoring Jobs.

Developed Python Scripts to perform DI checks.

Created EMR, S3, Cloudwatch rules using AWS Python Boto3 SDK.

Created various AWS resources using CloudFormation templates.

Write the python script to automate the tasks of assuming the IAM roles and copying s3 files to different environments.

Pushed application logs from EMR/EC2 instances to Elastic search and create dashboards out of these to monitor the health of our applications.

Expertise in preparing the test cases, documenting and performing unit testing and Integration.

Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.

Fast learner with good interpersonal skills, having strong analytical and communication skills TECHNNICAL SKILLS

Programming : SQL, Python, Pyspark, SAP ABAP on HANA Hadoop Stack : HDFS, Map Reduce, Spark, Spark SQL, RDDs, Hive, Pig, Sqoop, HCatalog, Oozie. AWS Stack : AWS S3, EMR, Lambda, CloudFormation, Cloud-Watch, Service Catalog, Athena. Scripting Languages : Shell Scripting and Python Scripting DBMS / RDBMS : Oracle 11g, Mysql

Version Control : Git and bitbucket

PROFESSIONAL EXPERIENCE

Accenture Solutions Private Limited

June.2015- August.2018

Role: Hadoop Application Developer

Environment: Cloudera Cluster(CDH5.8.2), HDFS, Hive, Pig, Sqoop, Teradata, SQL Server, Oracle, UNIX Shell Scripting, Oozie, Impala, Pyspark, Spark SQL, Python, Autosys, Hue, File Formats – Parquet, Avro and Text, AWS S3, EMR, Lambda, CloudFormation, Cloud-Watch.

ROLES AND RESPONSIBILITIES

Involved in sourcing the data from different sources (Mainframe Teradata, flat files etc.) into HDFS. Data is processed monthly and loaded into HDFS and created hive tables to access the data.

Involved in importing and exporting data from different databases like Oracle, Teradata and SQL Server using Sqoop.

Involved in creating common components to automate unique key generation process for millions of records and also an update component to process using Spark in Python

Created a script to automatically trigger any oozie workflows through a wrapper script and schedule the same through Autosys

Involved in creating the hive tables using different formats.

Worked with different file formats like Avro, Parquet and Text, sequence, ORC

Developed a component which stores all passwords in encrypted format.

Developed scripts using HiveQL and Pig Latin for transformation of data before data is given to end-users.

Developed scripts using HiveQL, applied business logic and generated reports as per business requirement.

Created RDDs and Data Frame to load data and performed Join operation on top of them

Registered Data Frames as tables/views and used Spark SQL to Query them.

Developed UDFs when necessary to use in PIG and HIVE queries

Created internal and external tables with properly defined static and dynamic partitions for efficiency.

Developed common Oozie templates that can be included in other end to end workflows

Performed Data quality checks and Data Integrity checks using Java components.

Developed end to end workflows using Oozie and scheduled them using Autosys

Exposing views to the downstream using Impala and controlling their access through Sentry

Involved in estimating risk and planning migration of code and terabytes of data to new cluster with different configurations

Experience in creating AutoSys jobs for scheduling workloads.

Migrated code of over 200 production Oozie jobs to new platform.

Developed automation scripts to remediate Oozie hive actions to beeline actions in over 200 xml files.

Responsible for replicating over 500 Tables/Views in new platform.

Responsible for migrating data to target locations (over 700 tables) in new cluster.

Documented the systems processes and procedures for future references.

Transformed reports that ran on legacy systems like Mainframe and Teradata system to run those reports on Hadoop.

Develop and run serverless Spark based applications using AWS Lambda service and pyspark to compute metrics for various business requirements.

Use Git as version control tool for maintaining software.

Analyze, store and process data captured from different sources using different AWS cloud services such as AWS S3, Cloud Formation, Lambda and EMR services.

Having very good knowledge in creating and modifying the S3 bucket policies, creating IAM roles and polices, giving read and write access to users on our s3 buckets etc.

Responsible for using Hive and Spark for processing large amounts of data on daily basis and maintain the processed datasets in AWS S3 and publish the data to downstream teams to access.

Designed and developed ETL pipeline from scratch using Sqoop, Spark, Hive, Athena, EMR, Autosys to gather data insights as per business requirements.

Tuning parameters for existing Spark jobs.

Troubleshooting issues related to Core Hadoop components like Hive, Pig, Spark on EMR clusters.

Use JIRA tool for updating my tasks/stories created by agile lead PROJECT PROFILE

Client: Carlsberg

February.2015-June.2015

Role: Application Development Analyst

ROLES AND RESPONSIBILITIES

Worked on SRM portal (addition of two fields in shopping cart for only limit items and populate the values based

on the conditions)

Worked on BAPIs of SRM portal

Supported RICEF objects as well though recruited in the project for SRM role(worked on changing the Inspection lot End date to 7 day calendar, company code specific, changing value in IDOC segment based on the values in the other fields of the segment)

Production support and performance tuning for custom reports. PROJECT PROFILE

Client: Novartis

June.2013-January-2015

Role: Associate software Engineer

ROLES AND RESPONSIBILITIES

Worked on Service Orders (different RICEF objects) for a year initially and moved to small projects (SRM) later.

Good experience in developing forms in various languages like Hebrew, Chinese, Korean, Ireland, Hongkong,

Slovakia, German, Benelux, Romania etc.

Worked on Webdyn Pro, field validations and BADI related to Purchase order and Shopping cart catalogs.

Worked on the enhancements in XML PO output, sending PO output via E-Mail

Worked on enhancements of automatic generation of mails to different levels of approvers while creating

SRM Shopping Carts.

Worked on translations in shopping cart and PO portals.

Have basic knowledge on IDOCs.

Worked on reports and Enhancements:

Creation of implicit enhancements to add a column to VA02 transaction based on specific sales Area.

Creation of implicit enhancements to create a batch number while releasing the process order (COR2).

Creation of implicit enhancements for batch seed elimination in MM02.

Worked on creation of interactive reports.

Creation of User exit to display the text based on the company codes and posting keys in FB01/FB02

transaction.

Worked on issues related to Tax invoice printing fetching the master recipe data from C202. EDUCATION

Masters, Information Systems Technology January.2019-June.2020 Wilmington University, New Castle, DE GPA. 3.9/4.0 Bachelor of Technology, Computer Science and Engineering June.2009-July.2013 Jawaharlal Nehru Technological University, Hyderabad, India GPA. 3.89/4.0 VOLUNTEERING

CSR Volunteer Lead at Accenture Hyderabad, India from 2014-2018

Volunteered as part of NSS from 2011 to 2013(India) SKILLS

Data Engineer AWS, Mysql, Python, spark, Pyspark, Hive, Oozie, Autosys, NoSql, EMR, ECR, EC2, Lambda, Glue, Unix, Linux, Bash, Pipelines, Data transformation, programming languages, Problem solving, critical thinking, Transformations, Data Integrity, Data Modeling, Data Quality, database design, snowflake, Data Quality, Data cleaning, Data standardization, Hbase, Integration, Interfaces, Cassandra, architecture, Unix Utilities, Big Data, Apache Spark, Cloud Computing, Data Mining, ETL.



Contact this candidate