Navya N Kumar
Certified Databricks Developer
PROFESSIONAL SUMMARY:
•With over 9 years of experience, there is a strong background in working with Databricks lake house, migrating Hadoop and SQL databases to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data Warehouse. Excel at managing and granting data access, as well as migrating on-premises databases to Azure Data Lake store using Azure Data Factory.
•Expertise in developing Spark applications using Spark-SQL in Databricks. Successfully extracted, transformed, and aggregated data from multiple file formats, enabling analysis and transformation to uncover valuable insights into customer usage patterns.
•Possesses a solid understanding of Spark Architecture, including its core components such as Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors, and Tasks.
•Knowledge extends to Big Data Hadoop and Yarn architecture, along with various Hadoop demons like Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing).
•A strong grasp of Hadoop cluster architecture, encompassing distributed file systems, parallel processing, scalability, fault tolerance, and high availability.
•Within Databricks, gained extensive experience in utilizing various Hadoop ecosystem tools, including HDFS, MapReduce, Yarn, Spark, Kafka, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, Flume, and Nifi.
•Skilled in designing and implementing Dimensional Data Models for FACT and Dimension Tables using StarSchema and Snowflake in Databricks.
•Optimizing Hadoop cluster performance, debugging, monitoring, data transformations, mapping, cleaning, and troubleshooting are areas of demonstrated expertise in Databricks.
•Proficient in Python programming for Databricks, utilizing features such as Lists, Dictionaries, Tuples, the Pandas framework, and the boto3 package to read files from S3.
•Additionally, an excellent communicator with strong work ethics. Proactive, a team player, and maintains a positive attitude.
•Possesses domain knowledge in Finance, Logistics, and Health insurance.
•Skills extend to visualization tools such as Power BI and Excel, including formulas, Pivot Tables, Charts, and DAX Commands.
•Experience in Agile software development methodologies, including Scrum and Sprint, as well as traditional models like Waterfall and TDD for Databricks.
•Well-versed in UNIX, Linux, and Windows operating systems for Databricks.
•Have a solid understanding of the Software Development Lifecycle (SDLC) and various techniques like Waterfall and Agile for Databricks.
EDUCATION: Bachelor of Science in Computer Science Engineering(Visvesvaraya Technological University -2013)
Certification: Databricks Certified Data Engineer Associate
TECHNICAL SKILLS:
Cloud Technologies and Services
Amazon AWS- EMR, EC2, ENS, RDS, S3, Athena, Glue, Elastic search, Lambda, SQS, DynamoDB, Redshift, Kinesis, Microsoft Azure- Databricks, Data Lake, Blob Storage, Azure Data Factory, SQL Database, SQL Data Warehouse, Google Cloud Platform
Big Data Ecosystems
Databricks Lakehouse, Apache Spark, HDFS, YARN, Map-reduce, Sqoop, Hive, Oozie, Pig, Spark, Zookeeper, Cloudera Manager, Kafka, Flume, NiFi, Connect, Airflow, Stream Sets, Kafka connect
Hadoop Distributions
Apache Hadoop 2. x, Cloudera CDP, Hortonworks HDP
Scripting language
Python, PySpark, SparkSQL, SQL, Scala, R, shell scripting, HiveQL.
NoSQL Database
Cassandra, MongoDB, HBase
Database
MySQL, Oracle, Teradata, MSSQL SERVER, PostgreSQL, DB2
Version Control
Git, SVN
BI tools
Tableau, PowerBI
WORK EXPERIENCE:
3M Company, Saint Paul, MN Aug 2022 – Present
Role: Databricks Data Engineer
Responsibilities:
•Demonstrated expertise in implementing robust data security measures to safeguard sensitive information throughout the ETL pipeline process in Azure Databricks. Utilizing encryption protocols, access controls, and data masking techniques to protect data integrity and privacy.
•Proficiency in developing and maintaining data validation processes is a strength, utilizing Spark SQL and Databricks workflows.
•Possesses the skills to design and maintain Azure data warehouses to efficiently store business datasets.
•Capabilities extend to leveraging Python programming skills to build Azure Databricks ETL pipelines and meet specific business requirements.
•Proficient in writing and maintaining Python unit tests for Azure applications, utilizing Bitbucket and Jenkins.
•Successfully integrated Adobe Analytics with other marketing and analytics tools using Azure Databricks.
•Skilled in designing ETL Data Pipeline flows for data ingestion from RDBMS sources to Azure, including MySQL.
•Experienced in creating and maintaining data workflows with dependencies for efficient big data processing on Azure Databricks.
•Strong background in adhering to data governance best practices, utilizing Unity Catalog and version control on Azure.
•Proficient in providing datasets for building models developed by the data science team using Azure Databricks.
•Skilled in data modeling, statistical analysis, and data visualization on Azure Databricks.
•Capable of enhancing Tableau visualization load times by caching data with Data Reflections using the Azure Data Lake House Engine.
•Experienced in optimizing resource-intensive SQL database queries to achieve significantly faster performance using Azure best practices.
•Skilled in designing reports using Azure Databricks warehouse endpoint to identify the latest hacking techniques prevalent in the market.
•Capable of designing complex SQL queries to validate data and establish a reliable data warehouse using Azure Databricks.
•Experienced in creating Azure Databricks workflows to ensure continuous job execution for efficient data processing.
•Proficient in developing and maintaining data catalogs in Azure Databricks to document data sources, metadata, and lineage information.
Environment: Azure data bricks, Teradata, SQL Server, ETL pipelines, Terraform, Apache Presto, Apache drill.
Discover Financial, Riverwoods, IL Nov 2020 – Aug 2022
Role: Databricks Data Engineer
Responsibilities:
•Utilized Spark Dataframes and Python to convert SQL queries into PySpark code.
•Analyzed SQL scripts and proposed PySpark solutions.
•Transformed MapReduce programs into Spark transformations using Python and Spark Dataframes.
•Designed and constructed modern data solutions on Azure PaaS services to enable data visualization and evaluated the impact of new implementations on existing business processes.
•Extracted, transformed, and loaded data from various source systems into Azure Data Storage services using Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
•Processed data in Azure Databricks and ingested it into Azure services such as Azure Data Lake, Azure Storage, Azure SQL, and Azure Data Warehouse.
•Created pipelines in Azure Data Factory using Linked Services, Datasets, and Pipelines to extract, transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data Warehouse, and write-back tools.
•Developed PySpark and Spark-SQL applications for data extraction, transformation, and aggregation from multiple file formats, providing insights into customer usage patterns.
•Assumed responsibility for estimating cluster size, monitoring, and troubleshooting Spark Databricks clusters.
•Proficient in optimizing the performance of Spark applications, including batch interval time, parallelism level, and memory tuning.
•Implemented user-defined functions (UDFs) in PySpark to fulfill specific business requirements.
•Skilled in developing SQL scripts for automation purposes.
•Designed and implemented efficient data models in Databricks for data storage and processing.
•Optimized data storage and access through techniques like partitioning, bucketing, and indexing.
•Developed and maintained ETL pipelines in Databricks, extracting data from various sources, performing transformations, and loading it into target systems.
•Monitored and managed data pipelines to ensure high availability, reliability, and scalability of data processing.
•Collaborated with data scientists and business analysts to understand their data requirements and deliver data solutions in Databricks.
•Automated data workflows and conducted data quality checks using Databricks and other technologies.
•Established and maintained data governance policies and standards in Databricks, ensuring data accuracy, consistency, and security.
•Implemented machine learning pipelines in Databricks to develop predictive models and deploy them in production.
•Developed and maintained data catalogs in Databricks, documenting data sources, metadata, and lineage information.
•Troubleshot and resolved data-related issues in Databricks, ensuring the smooth functioning of data pipelines and systems.
Environment: Azure Data Factory (ADF), Azure Synapse Analytics, Azure Databricks, Azure SQL, Azure Cloud, Azure Data Factory (ADFv2), Azure functions Apps, Azure Data Lake, BLOB Storage, SQL Server, Windows remote desktop, AZURE Power Shell, Databricks, Python, Kubernetes, Azure SQLServer, Azure Data Warehouse.
Client: Barclays, New York City, NY March 2019 – Nov 2020
Role: Senior Data Engineer
Responsibilities:
•Demonstrated expertise in optimizing the performance of Spark applications, including setting the appropriate batch interval time, parallelism level, and memory tuning.
•Took ownership of estimating cluster size, monitoring, and troubleshooting the Spark Databricks cluster.
•Created Python APIs to capture array structures in the processor at failure points for debugging purposes.
•Proficient in Microsoft Azure components, such as Azure Data Factory, Azure Data Lake Storage, Azure SQL, Azure Databricks, HDInsight, and ML Service.
•Developed Spark applications using PySpark and Spark SQL to extract, transform, and aggregate data from multiple file formats, enabling analysis and transformation to uncover insights into customer usage patterns.
•Possess a solid understanding of Spark Architecture, including Spark Core, Spark SQL, DataFrames, Spark Streaming, driver node, worker node, stages, executors, and tasks.
•Created Spark jobs using Scala to interact with PostgreSQL databases through Spark SQL Context and accessed Hive tables using Hive Context.
•Thoroughly understood client requirements and devised the most effective approach to meet customer use cases.
•Developed a hierarchical representation of the system by defining components and subcomponents using Python and created a library of functions tailored to user needs.
•Designed and developed data models for enterprise data lakes, supporting various use cases such as analytics, processing, storage, and reporting of large and rapidly changing data.
•Assumed responsibility for production support, on-call duties, debugging, and rapid problem-solving.
Environment: Azure data factory, Azure data lake storage, Databricks, Spark, SQL Server, Python, CI/CD Jenkins, Spark-SQL, Scala, Pyspark, Spark-SQL, PostgreSQL.
Accenture INDIA Jun 2013 – Dec 2018
Project 1: Nike-Data-Driven Decision-Making and Business Intelligence
Role: Data Analyst II
Responsibilities:
•Contributed to the future work backlog by creating agile user stories for the development of data analytics pipelines, including ETLs, orchestration, and grooming user stories with defined acceptance criteria.
•Demonstrated in-depth business system knowledge to support the digital transformation initiative of the Product Integrity organization, specifically focusing on manufacturing, freight, and flight reports.
•Designed and executed proof of concepts related to data acquisition, transformation, measurement, and key performance indicators (KPIs) based on identified business needs.
•Actively participated in Nike's AI/ML laboratory exchanges to generate value for the Product Integrity organization. Utilized the on-site laboratory for conducting in-house inspections of Nike products and performed customer sentiment analysis based on data analytic reports.
•Collaborated with Data Engineers and Data Scientists to groom and support the application systems involved in the end-to-end lineage of data. Took primary ownership of defining acceptance criteria for data acquisition requirements.
•Owned and documented the Data Governance and Data Quality standards for the data lake associated with Product Integrity and global manufacturing units of Nike worldwide.
Environment: Fully Agile, AWS, Python, SQL, Snowflake, Excel, Tableau.
Project 2: Myentry – Online Accounting Software
Role: Data Analyst II
Responsibilities:
•Developed Azure data pipelines, incorporating agile methodologies for incremental enhancements and user story grooming, with clearly defined acceptance criteria.
•Utilized knowledge of business and accounting systems to support a digital transformation initiative, focusing on Azure-based transactional data and financial reporting.
•Executed Azure-based proofs of concept for data acquisition, transformation, and measurement, aligning with identified business KPIs.
•Integrated AI/ML features into the data platform using Azure Machine Learning, facilitating predictive financial forecasting and trend analysis.
•Collaborated on the end-to-end data lineage in Azure, defining acceptance criteria for data acquisition requirements to ensure accurate financial data.
•Implemented Azure-based Data Governance and Data Quality standards, developing strategies for data integrity checks and anomaly handling.
Environment: Agile Greenfield and Test-Driven Development. Azure Databricks, Python, SQL, PowerShell, YAML, Excel
Project 2: Oncore Electric – Utility and Electricity Delivery
Senior Database Developer I
Responsibilities:
•Collaborate as an SQL Developer, building strong relationships with end-users and technical resources throughout established projects.
•Provide support for database objects (stored procedures, triggers, views) and conduct code reviews in close coordination with the development team.
•Ensure proper design and implementation of database systems by closely working with Application Developers.
•Perform database administration tasks, including performance monitoring, access management, and storage management.
•Partner with development teams and vendors to generate reports on ongoing software development projects.
•Take charge of SQL development and development of SSIS/SSAS/SSRS solutions.
•Collaborate with other developers to facilitate the creation of high-performing data-centric components and applications.
•Develop and maintain SQL processes for data warehousing channel performance data.
•Work closely with the development manager and an on-site team to deliver effective solutions.
Environment: SQL, SSAS, Stored Procedures, Triggers, Java, Unit Testing