Carlos Paradis

Carlos Paradis

Data Scientist

I am a data scientist employed by KBR as a federal contractor for NASA. I hold a MS and PhD in Computer Science from the University of Hawaii at Manoa, and a MS in Software Engineering from the Stevens Institute of Technology. I am also a honor member of IEEE-HKN and ACM-UPE by the same institutions, a lifetime member of ACM, and a recipient of the Science Without Borders scholarship.

94035, Mountain View, California, USA
GitHub: carlosparadis
ORCID: 0000-0002-3062-7547



Applied Researcher in Natural Language Processing - Federal Contractor at KBR - NASA Ames Research Center

Applying Natural language Processing (NLP) technologies to streamline the development and certification process for NASA missions


  • Develop a NLP capability to parse mission documents using machine learning techniques
  • Look for potential problems (such as contradictions or potential safety issues)
  • Evaluate the results with the help of mission developers
  • List of public released and published work can be found here:

Graduate Research at University of Hawaiʻi at Manoa - Shidler School of Business

Create a system to analyze software vulnerabilities evolution by mining software repositories ecosystems


  • Identify prospective methods to identify poor software development process in collaboration, coordination and communication
  • Identify prospective methods to identify poor architectural practices
  • Create a method to construct timelines of software vulnerabilities
  • Perform a case study in OpenSSL applying identified methods, in particular Heartbleed
  • Create an R package implementing final methodology, and reproducible results
  • Docs/Demo:
  • Conference Article:
  • Pre-Print:

Graduate Research at University of Hawaiʻi at Manoa - Shidler School of Business

Identify novel safety incidents based on textual narratives for NASA’s Aviation Safety Reporting System (ASRS)


  • Definition and analysis of the performance, stability and random effects of unsupervised text clustering algorithms
  • Definition of a protocol and design of survey heuristics to assess the stability of machine learning results
  • Identify published methodologies for interpretation of text clustering results
  • Define an IRB compliant survey protocol for evaluation of different text clustering interpretation methodologies
  • Reuse and improvement of Legacy Code, and creation of data pipelines to supply data visualization tools
  • Demos: 1) ; 2)
  • Create an R package implementing final methodology, and reproducible results
  • Docs/Demo:
  • Conference Article:

Intern/Data Analyst at Staffing Solutions at Kaiser Permanente


  • Designed and enhanced databases for predictive and prescriptive analytics
  • Designed and evaluated the effectiveness of statistical, probabilistic, and machine learning models
  • Elicited requirements, goals, and priorities from clinical and technical subject matter experts
  • Participated in the development of potential solutions, including outcome and process measures, and technical specifications
  • Translated needs, issues, and ideas into improved processes for patient care
  • Formulated specific implementation plans and evaluated the effectiveness of actions/programs implemented
  • Communicated results/recommendations to project sponsors, clients, and various senior level audiences

Graduate Research at University of Hawaiʻi at Manoa & University of Maryland

Identified CVE-related Software Vulnerability Discourse on Online Mailing List using Topic Modelling


  • Created Python Crawler to parse HTML Mailing Lists
  • Created taxonomy to classify different pre-processing strategies of raw corpus
  • Coded review student work via Github's pull requests
  • Performed Topic Modelling using R and LDA
  • Conference Article:

Graduate Research / Data Engineer at University of Hawaiʻi at Manoa - School of Architecture

Python coded data pipelines for sensor data processing, database storage, and R coded thermal comfort analyses. Performed Unix server administration duties to support architecture researchers.


  • Simplified knowledge management and workflow between supervisor and students from architecture, engineering, and computer science
  • Proposed and code-reviewed a new continuous python sensor data collection plug-in architecture for the lab in Python, Lonoa (
  • Implemented and explained model constraints for ASHRAE-55 PMV/PPD and Adaptive thermal comfort models (
  • Provided System Administrator support for the lab (SSH Access, Unix Package Management, and Security)
  • Proposed, designed, and code reviewed two Raspberry PI Environmental Projects (awarded Intel NCS Funding)
  • Designed database schema and code reviewed survey station code for environmental experiments (
  • Mentored undergraduate and master student projects
  • Conference Article: (ARCC 2019 Conference Proceedings)

Intern II at SGT at NASA Ames Research Center

Created a model of an attack surface between military drone communication by inserting a ‘liar’ script at the communication boundary using OpenUxAS, developed by the Air Force Research Laboratory.


  • Conference Article:

Graduate Research at University of Hawaiʻi at Manoa & Siemens

Identified associations between software companies’ organizational structure and their software products.


  • Created scripts to calculate metrics from version control systems and issue trackers
  • Defined static metrics to evaluate graph motifs on code collaboration social networks and discussion social networks
  • Conference Poster:
  • Journal Article: To appear on IEEE Transactions on Software Engineering

Graduate Research at University of Hawaiʻi at Manoa & HECO

Performed data mining/analytics of weather related data for renewable (solar) energy management in Hawaiʻi. (Master Thesis)


  • Created a pipeline to collect data from various sensors across Hawaiʻi
  • Scrape associated metadata through websites and several visualizations to identify missing or biased values through multiple years
  • Used cluster methods to address time series granularity, probabilistic and linear models were also used to forecast solar irradiation on different sites
  • Conference Article:

Graduate Research at University of Hawaiʻi at Manoa - Office of Public Health Studies

Analyzed cross-sectional surveys conducted in Ethiopia by DHS to identify effects between sanitation and malnutrition.


  • Performed systematic review (database search, snowballing, and inverse snowballing)
  • Surveyed, data pre-processed / cleansed and performed correlation analysis of variables of interest

Lecturer at Universidade Federal da Bahia, Brazil

Lecturer for Information Systems Major


  • Databases Lab
  • Data Structures
  • Information Retrieval
  • Material: I used a combination of lecture notes, Latex, ”hand-made” Tikz forms, color, and other visual examples to illustrate abstract concepts for freshman

Undergraduate Teacher Assistant at Universidade Federal da Bahia, Brazil

Teacher Assistant for Computer Science Major


  • Formal and Automata Languages
  • Programming Logic for Computer Science
  • Paradigms of Programming Languages
  • Activities consisted of giving lectures under professor supervision, assisted in creating material and grading, and organized guests coding dojos

Undergraduate Research at University of Hawaiʻi at Manoa & Drexel University

Identified patterns of effort in software development.


  • Gathered source code and issue tracker data (website, .xml, .csv, .xlsx) using Python and R and storage (PostgreSQL)
  • Performed summary statistics and correlation analysis reports
  • Journal Article:

Undergraduate Research at Universidade Federal da Bahia - Formas

Identified patterns that would lead to STEM student retention in U. Federal da Bahia.


  • Gathered and processed data (PDF format transcripts)
  • Created schema and data population pipeline of a PostgreSQL database
  • Performed exploratory data analysis, association rule learning, and literature review.
  • Conference Article:

Undergraduate Research at Universidade Federal da Bahia

Tracked and analyzed programmer behavior Eclipse IDE usage to better understand the impact of conceptualization in god class detection.


  • Captured point and click on logs using Eclipse plugins
  • Collected and processed sequence data (.csv)
  • Performed exploratory data analysis and process mining
  • Conference Article:


External Reviewer at Journal of Aerospace Information Systems (JAIS)


    Shadow Program Committee Member at Mining Software Repositories (MSR)


      Reviewer at IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)


        Program Committee Member at International Conference on Machine Learning and Applications (ICMLA)


          Intel Nervana AI Academy Student Ambassador at Intel Nervana


            Volunteer at Code for Hawaiʻi

            I facilitate open-budget access for state of Hawaiʻi citizens, allowing them to make more informed decisions in choosing their state representatives by providing additional accessibility to candidates financial ties: Data cleaning and text linkage of various records concerning candidates campaign contribution distributed on different official pages of the state of Hawaiʻi are put together for navigation on a single website.


              External Reviewer at EDBT/ICDT 2016 Joint Conference


                Volunteer at Open Knowledge Brazil

                I volunteered as a data analyst for the Open Spending team of Open Knowledge Brazil Chapter, which was a finalist on Google Impact Challenge in Brazil. The project goal was to analyze the city and state of São Paulo, and also the federal’s fiscal year budget, serving as case study to implement the same method and tools to other cities. Mainly, the project sought to read between the government’s budget lines and understand where tax money was spent. My role in the team was to help write data stories, finding and exploring what data was available in existing outdated websites, and help make it available of it through our website and/or API. I also explored in parallel the open spending of Salvador, another city in Brazil and my hometown as both city and state’s budget data availability.


                  Volunteer at Science without Borders Network 'Rede CsF'

                  Rede CsF is a non-profit organization based in Brazil and created by students who were awarded the Science Without Border scholarship from Brazil government. The non-profit is a return on investment to create projects in the country to improve science, technology, innovation and education. Within the network, I collaborate on the Open Data Awareness Project and the Intranet, the laterto coordinate over 40 contributors activities.


                    Chair of Student Chapter at Association for Computing Machinery

                    I founded the first and only ACM student chapter of Brazil. Main chapter activities were initially focused on raising awareness and supporting local computer science events. The chapter was featured on XRDS, the student magazine of ACM, on its first semester.



                      PhD in Computer Science - GPA: 4.0 from University of Hawaiʻi, Honolulu, HI with GPA of 4.0

                      MS in Computer Science - GPA: 4.0 from University of Hawaiʻi, Honolulu, HI with GPA of 4.0

                      MS in Software Engineering - GPA: 3.92 from Stevens Institute of Technology, Hoboken, NJ with GPA of 3.92

                      BS in Computer Science - GPA: 9.5 (0-10) from Universidade Federal da Bahia, Brazil with GPA of 9.5 (0-10)


                      ICS Achievement Scholarship from University of Hawaiʻi at Manoa

                      Featured Idea and Intel NCS Funding from Intel

                      IEEE HKN from IEEE Honor Society HKN Delta Omega

                      Golden Key Honors Member from Golden Key International Honour Society

                      Featured Student Brazilian Award from Brazilian Computer Society

                      Science Without Borders - Capes and LASPAU from Ciencia Sem Fronteiras

                      ACM UPE from ACM Honor Society UPE

                      In Search of Socio-Technical Congruence: A Large-Scale Longitudinal Study by IEEE Transactions on Software Engineering

                      Measured: Student Learning Through Monitoring Existing Buildings’ Energy Use And Occupant Comfort by Architectural Research Centers Consortium (ARCC)

                      Towards Explaining Security Defects in Complex Autonomous Aerospace Systems by AIAA Scitech 2019 Forum

                      Indexing Text Related to Software Vulnerabilities in Noisy Communities Through Topic Modelling by 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA 2018)

                      Conway: Law or not? by 2018 40th International Conference On Software Engineering 2018 (ICSE 2018)

                      Probabilistic Models for One-Day Ahead Solar Irradiance Forecasting in Renewable Energy Applications by Internacional Conference on Machine Learning and Applications, Special Track on Machine Learning on Energy Applications (ICMLA 2015)

                      Manufacturing execution systems: A vision for managing software development, Journal of Systems and Software by Journal of Systems and Software, Volume 101, Pages 59-68, ISSN 0164-1212

                      Mining Retention Rules from Student Transcripts: A Case Study of the Information Systems programme at a Federal University by Anais do Simpósio Brasileiro de Informática na Educação, v. 1, p. 1, 2013

                      An exploratory study to investigate the impact of conceptualization in god class detection by Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering - EASE ‘13

                      Teaching Software Engineering Fundamentals in an Introductory Computer Programming Course by Fórum de Educação em Engenharia de Software (FEES), 2011, São Paulo


                      Languages and Frameworks
                      • R
                      • Python


                      mining software repositories
                        static code analysis
                          social network analysis
                            text mining