Sign in

Data Engineer Scientist

Chantilly, VA
July 30, 2022

Contact this candidate



Solomon A. Kidanu

***** ********* ******* *****

Stone Ridge, Va 20105

Phone : (571) 398 - 8093

Email :


• Proficient data engineer with four years of experience in the health industry, and more than 10 years of experience in research, project and teaching focused on areas of data science, data modeling, data warehousing, ETL, data analysis, machine learning, and data visualization.

• Solid back ground in computer science, statistics and mathematics.

• Well-versed in relational database management systems (RDBMS) including MS SQL Server, MYSQL, Oracle, and also exposure to NoSQL databases such as MongoDB.

• Proficient in SQL, T SQL and PL/SQL (Stored Procedures, Functions, Triggers, Cursors, and Packages), and creating ETL processes.

• Established SSIS packages for various ETL works

• Experienced in Python for ETL tasks and also python libraries for machine learning projects like Pandas, NumPy, Scikit-Learn, SciPy,Matplotlib, Seaborn, etc.

• Experienced in Power BI and Tableau of for BI and data visualization projects

• Exposure to cloud environments like Azure and AWS

• Experienced in in version control tools such as GIT and Bitbucket

• Extensive experience in agile software development methodology.

• Healthcare domain knowledge specifically Medicare. Medicaid, and care and payments systems

• Experienced in working in multicultural and interdisciplinary teams and also recognized for high teamwork skills, work ethics and integrity.

• Excellent communication and presentation skills with the ability to present and translate complex information to non-technical business audiences.


1. Ph.D. in Computer Science 2016

University of Pau and Adour Countries Pau, France

2. M.Sc. in Computer Science 2008

Addis Ababa University Addis Ababa, Ethiopia

3. B.Sc. in Information Systems 2004

Addis Ababa University Addis Ababa, Ethiopia


Work Experience

Burgess - A HealthEdge Company Jan. 2018 - PRESENT Senior Data Engineer ( Jan. 2020 – Present)


• Working with the Production DBA to make sure that Production Data is the correct source of truth

• Integrates data from various sources such as,, American Medical Association

(AMA), third party vendors into the Burgess Group data infrastructure.

• Manages the flow of the data transforming raw data into useful information, and handles all schema changes in the organization

• Building the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources such as flat files, excel, Access, PDF, XML, JSON, EDI etc.

• Creating & maintaining complex T-SQL scripts, stored procedures, functions, views, triggers, CTEs, Recursive table expressions etc. to handle large and complex datasets.

• Employing variety of tools to move data that includes SSIS, Python modules, and build ETL processes.

• Producing and maintaining optimal data pipeline architecture for various Burgess applications, provide channels and processes that pushes data to various applications and ensure integrity of the data.

• Implementing best practices for data updates and development, troubleshoot performance and other data related issues on multiple product applications.

• Automating data process, and optimize data delivery

• Developing reports on data dictionary, server metadata, user data using reporting tools

• Working with stakeholders including the Product, Payment Policy, DBAs, QA, client services to assist with data-related technical issues and support their data infrastructure needs.

• Ensuring that the internal data Repository is organized and up to date.

• Working, daily, with Agile Scrum software development methodology and updates time data within Sprint cycles to assure consistent and accurate delivery of Healthcare data and software

• Training and coaching junior dta engineers


Data Scientist Jan. 2018 – January 2020


1. Medical Claim Denial Management Using Machine Learning Techniques o Goal:

The project was to predict and prevent errors in medical claims payment processing. Specifically,

• The ability to predict prior to submission if a specific claim will be processed correctly and receive full payment, or processed incorrectly and declined or not paid in full.

• Detect regularities in incorrectly processed claims. o Approach - Supervised Machine Learning for Classification & Anomaly Detection o Algorithms - Two Class Support Vector Machine, Random Forest, Logistic Regression, Decision Tree, PCA-Based Anomaly Detection o Scripting language -Python ( Pandas and Scikit-learn Libraries) o Environment - Azure Machine Learning Studio

2. Predictive Analytics in Health Insurance Claims

The goal of this project is to use predictive analytics to reduce claim duration and costs.

By using predictive analytics to sift through millions of data sets, claims specialists can uncover complex patterns, trends and correlations that otherwise would pass undetected.

Machine learning algorithms are employed to make sense of these associations and putting this knowledge in the right hands quickly..

The model and deployment from this project will provide insights about the likelihood of future claim events, ferrets out instances of fraudulent claims activity, and ensures appropriate treatments. It also can be applied to discover the possibility of fraud, waste and abuse.


University of Pau and Adour Countries, France Sept. 2012 – Aug. 2017 Researcher in Data Science

Designed and developed a framework and a tool for Collaborative Data selection, Aggregation, Manipulation & Visualization from heterogeneous sources of social media.

Developed a digital ecosystem for better management of multimedia contents in collaborative environments.

Designed and developed event detection & extraction framework from social media.

Developed a generic and a comprehensive agent-based ontology for multimedia data.

Created custom reports and dashboards that provided a unique visualization of the results of the researches.

Presented and published in international conferences to the outcomes of the above research works

Addis Ababa University, Ethiopia Mar. 2005 – Sept. 2012 Researcher and Lecturer in Computer Science

Designed and implemented Prediction Model for Part-of-Speech (POS) Tagger in a Natural Language.

Developed Artificially Intelligent Football Coaching System.

Implemented Mobile Based Booking and Self Checking System for Ethiopian Airlines, Ethiopia.

Project Lead, Addis Taxi Routing Controlling System with GPS Enabled for Addis Ababa City, Ethiopia.

Developed Web-based Scheduling System for Addis Ababa University, Ethiopia.

Offered various programming, algorithmic, database and artificial intelligence, networking courses to undergraduate students of computer science, statistics and mathematics.

Advised and supervised remarkable amount of graduate class projects for computer science students.

Coordinated and administered as an assistant dean of the College of Natural Sciences, and chair and members of the various departmental committees. 5

Skills and Tools

• Programming & Scripting languages: C++, Java, Python, R, Perl, Prolog, MATLAB

• Database Software Systems: Microsoft SQL Server, MySQL, Oracle, MS-Access

• Statistical Analysis Programs : SPSS, SAS, MS-Excel

• Data visualization tools: Power BI, Tableau,

• Big Data Technologies and Tools: Hadoop

• Cloud platform: Microsoft Azure, AWS

• Version control tools: Git, Bit Bucket

• Mathematical and Statistics knowledge: calculus, linear algebra, Time series and Regression, Correlation, etc.

• Scientific Research Methodology

• Software Engineering & Software Project Management Publications

Book: Neural Network and Rule-Based Methods for Part-of-Speech Tagging: Hybrid Approach

Some of the articles published in high ranking International Conference Proceedings

• Multimedia Digital Ecosystem - New Platform for Collaboration and Sharing

• A Multimedia-oriented Digital Ecosystem: a New Collaborative Environment

• MAS2DES-Onto: Ontology for MAS-based Digital Ecosystems.

• Onto2MAS: An Ontology-Based Framework for Automatic Multi-Agent System Generation.

• Event Extraction for Collective Knowledge in Multimedia Digital Ecosystem.

* References will be available upon request

Contact this candidate