Sign in

Hadoop Developer Data Engineer

Seattle, WA
March 17, 2023

Contact this candidate


Sairam Tabibu

Role: Big Data Engineer

Phone: +1-206-***-****



●Overall, 7+ years of IT experience with experience in Data Engineering, Analytics, Data Modeling, Python, Cloud Technologies AWS, AZURE, and Big Data (using Scala, PySpark, Hadoop, and HDFS environment).

●Python development experience encompasses the utilization of new tools and technical developments (libraries used: libraries- Beautiful Soup, Jasy, Pytorch, Tensorflow, Tensor vision, Keras, NumPy, Theano, LightGBM, Scipy, MatPlotLib, Pickle, Pyside, OpenCV, inflection, python-twitter, Pandas dataframe, networks, urllib2, MySQL dB for database connectivity, Scikit Learn ).

●Experience developing ETL work scheduling for Data Extraction, transformations, and loading using Informatica Power Center.

●Implemented UNIX shell scripts to run the Informatica workflows and controlling the ETL flow.

●Proficiency in developing SQL queries with various relational databases like Oracle, SQL Server for Support of Data Warehousing, and Data Integration Solutions using Informatica PowerCenter and Data Stage.

●Firm understanding of Hadoop architecture and various components including HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming.

●Implemented Big Data solutions using Hadoop technology stack, including PySpark, Hive, Sqoop, Avro.

●Expertise in Creating, Debugging, Scheduling, and Monitoring jobs using Airflow and Oozie.

●Experienced in Optimizing the PySpark jobs to run on Kubernetes Cluster for faster data processing.

●Involved in converting Hive Queries into various Spark Actions and Transformations by Creating RDD and Data frame from the required files in HDFS.

●Expertise in Beautiful soup and Scrapy libraries for designing web crawlers.

●Proficient in developing websites and web applications using MYSQL, HTML 4.01/5, XML, JSON, CSS, Java Script & AJAX.

●Performed Test Driven Development and Behavior Driven Development methodologies for enterprise-wide projects.

●Building out the next-generation public-facing web site with an advanced Django-based web framework.

●Used Pandas, Numpy, MatplotLib, Scipy in Python for developing various machine learning algorithms on python.

●Proficient in applying statistical modelling and machine learning techniques (Linear regression, Logistic Regression, Decision Trees, Random Forest, SVM, Bayesian, XG Boost) in Forecasting/predictive analytics.

●Experience in the design of Dynamo DB database - Indexing and Shading.

●Experienced in working with various Python Integrated Development Environments like NetBeans, PyCharm, PyStudio, Eclipse and Sublime Text.

●Experience in developing applications using amazon web services like EC2, Cloud Search, Elastic Load balancer ELB, Cloud Front, Route 53.

●Experience in using various version control systems like Git (GitHub / Bitbucket), SVN.

●Implemented CI/CD pipeline by testing and deployment of the applications

●Worked on creating the Docker containers, Docker container images, tagging and pushing the images and Docker consoles for managing the application life cycle.

●Experience in analyzing and handling large database using S3 Data Architecture.

●Defined user stories and driving the agile board in JIRA during project execution, participate in sprint demo and retrospective.

Area of Expertise:

Hadoop Ecosystem

HDFS, SQL, YARN, MapReduce, Hive, Sqoop, Spark, Zookeeper, Oozie, Kafka, Flume

Programming Languages

Python, Spark, Scala.

Big Data Platforms

Hortonworks, Cloudera

AWS Services

S3, DynamoDB, Lambda, Batch, IAM, SQS, SNS, Step Function, Cloudwatch, EC2, Sagemaker, ECS, API Gateway, VPC, ElasticSearch, Kinesis, Fargate, Hydra, Cloudformation, EBS

Azure Services

Azure, Azure Data Factory Pipeline, Cosmos DB

Operating Systems

Linux, Windows, UNIX, macOS


MySQL, NOSQL, UDB, HBase, MongoDB, Cassandra, Snowflake, DBT

Development Methods

Agile/Scrum, Waterfall


PyCharm, Visual Studio

Data Visualization

Tableau, Power BI

Professional Experience:

AMAZON, Seattle, WA Aug 2021 – Mar 2023

Role: Python Software Developer

●Launched the first ever clothing stores of Amazon - Amazon Style which had virtual AI recommendation stylists instead of human stylist which got featured as the most innovative retail store of the year.

●Analysis, Design, Development, Testing, Customization, Bug fixes, Enhancement, Support and Implementation of various machine learning workflows and microservices.

●Worked on creating CRUD ( Create, Delete, Update and Delete ) and REST API’s which were responsible for the functioning of widgets in the store which customers directly interacted with.

●Extensive experience working with multiple AWS services and managing resources and CI/CD pipelines which handle direct customer traffic.

●Created custom AWS IAM roles to enable critical actions role for the team.

●Used AWS Batch to ingest data from DynamoDB tables from sister teams to curate and train ML models.

●Wrote custom scripts to create tables in Non relational, No SQL database - DynamoDB and wrote custom code to process and update fields of millions of records in it.

●Extensive experience in working with JAVA using Object oriented programming (OOP) principles, using decorators, DAO providers, Exception handling, Coral.

●Wrote 1000+ lines of code which consisted of JUNIT tests which covered the main logic in the code.

●Used AWS EC2 to maintain the hosts, deprecate the hosts if necessary and redeploy the system on new hosts.

●Used AWS CDK Typescript to setup CI/CD pipelines responsible for deployments in the team.

●Have a good experience in in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling and Excel data extracts.

●Expertise in test automation and continuous delivery of web applications, web services/REST APIs Testing, with Python

●Create AWS Lambdas functions which ingested Inventory data and purchase data in JSON format and stored it in DynamoDB.

●Wrote automation script in python to automatically set DPS setting of EC2 hosts while deployment.

●Wrote 1000+ lines of code in Python in form of Integration tests, Pytests, Canary tests and Hydra tests which helped in smooth functioning of the API’s responsible for running of store.

●Designed and reviewed design documentation for many new services within the team which were built upon the AWS resources.

●Used Github and Gitlab to do version control of the code.

●Participated in code reviews and raised the standard of teh codes by giving useful comments.

●Setup A/B testing using webab for release of new features within the code.

●Paralellised the connection of AWS Lambda to REDIS cluster to reduce redundancy and latency of the system.

●Experience with deep diving into AWS cloudwatch logs and write custom scripts to gather relevant information from AWS logs.

●Experience with retrieving data directly from REDIS cache using CLI ( Command line tools ).

●Created necessary alarms for the systems and tuned them if necessary based on the SLA of the system.

●Experience in creating custom AWS cloudformation stacks, setting up Load balancers for clients.

●Setup appropriate documentation for the code and the features released.

●Experience working and leading cross functional team projects along with coordinating with PM’s ( product managers ).

●Used AWS sagemaker to run analytics experiments on customer data and get insights into what could icnrease the sales.

●Used Coral Server to perform testing of the API’s and improve their performance.

Environment: Python, RESTAPI, JAVA, JUNIT, PYTEST, AWS Infrastructure, DynamoDB, S3, Slack, Git, Github, IntelliJ, CI-CD, Linux, Slack, JSON, Docker.

Abb Vie Inc, Bangalore, India June 2017 – Aug 2019

Role: Data Engineer

●Responsible for building scalable distributed data solution using Hadoop Cluster environment with Hortonworks distribution.

●Converted raw data with ORC file format to reduce data processing time and increase data transferring efficiency through the network.

●Worked on building end-to-end data pipelines on Hadoop Data Platforms.

●Worked on Normalization and De-normalization techniques for optimum performance in relational and dimensional databases environments.

●Designed developed and tested Extract Transform Load (ETL) Informatica application with different types of sources.

●Created files and tuned the SQL queries in Hive Utilizing HUE. Implemented MapReduce jobs in Hive by querying the available data.

●Created User Defined Functions (UDF), User Defined Aggregates (UDA) Functions in Pig and Hive.

●Extensively worked with MySQL for identifying required tables and views to export into HDFS.

●Involved in designing of HDFS storage to have efficient number of block replicas of Data.

●Responsible for creating Hive tables on top of HDFS and developed Hive Queries to analyze the data.

●Staged data by persisting to Hive and connected Tableau with Spark cluster and developed dashboards.

●Implemented UDFS, UDAFS, UDTFS in Python for Hive to process the data that cannot be performed using Hive inbuilt functions. Used Hive to analyze the partitioned, bucketed data and compute various metrics for reporting.

●Responsible for automated deployment of python application in Tomcat Server using Ansible Scripts.

●Hive tables were created on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format.

●Worked on building custom ETL workflows using Spark/Hive to perform data cleaning and mapping.

●Implemented Kafka Custom encoders for custom input format to load data into Kafka portions.

●Support for the cluster, topics on the Kafka manager. Cloud formation scripting, security, and resource automation.

Environment: Python, HDFS, MapReduce, Flume, Kafka, Apache Spark, Zookeeper, Hive, HQL, HBase, Kafka, ETL, Web Services, Linux RedHat, Unix.

Knackease, Hyderabad, India Apr 2015 – May 2017

Role: Hadoop Developer

●Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class models, Sequence Diagrams, and Activity diagrams for the SDLC process of the application.

●Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.

●Extracted the data from Teradata into HDFS using Sqoop.

●Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.

●Exported the patterns analyzed back into Teradata using Sqoop.

●Monitoring and managing the Hadoop cluster through Cloudera Manager.

●Installed Oozie workflow engine to run multiple Hive.

●Developed Hive queries to process the data and generate the data cubes for visualizing.

●Coordinate with the business users in providing an appropriate, effective, and efficient way to design the new reporting needs based on the user with the existing functionality.

Environment: Hadoop, HDFS, MapReduce, Hive, UNIX, Sqoop, ETL, Pig Script, Cloudera, Oozie.

DELL technologies, Bangalore, India Apr 2013 – Mar 2015

Role: SQL Engineer

●Participated in Designing databases (schemas) to ensure that the relationship between data is guided by tightly bound Key constraints.

●Writing PL/SQL stored procedures, function, packages, triggers, view to implementing business rules into the Application level.

●Extensive experience in Data Definition, Data Manipulation, Data Query and Transaction Control Language

●Loaded Data into Oracle Tables using SQL Loader.

●Experience in Installing, Upgrading, and Configuring Microsoft SQL Server and Migrating data from different versions of SQL Server.

●Experience in designing and creating Tables, Views, User Created Data Types, Indexes, Stored Procedures, Cursors, Triggers and Transactions

●Developed warmup programs to load recently logged-in user profile information into MSSQL DB.

●Performed manual testing and used logging tools like Splunk, putty to read the application logs for elastic search.

●Configured the data mapping between Oracle and SQL Server and tested performance accuracy-related queries under SQL Server.

●Handful of experience in optimization of SQL statements and PL/ SQL blocks by analyzing the execution of SQL plan statements and created and modified triggers, SQL queries, stored procedures for the better performance.

●Responsible for installation of SQL Server databasedesign and create schema objects like tables and indexes.

●Understood the requirements by interacting with business users and mapping them to design and implementing it following the AGILE Development methodology.

Environment: PL/SQL, ETL, Oracle, Splunk, MySQL, SQL Server, JMS, Stored Procedures, Triggers.

Contact this candidate