Data Analysis and Enrichment

Enhancing data, statistical analysis, deep learning, machine learning models.

Data Cleaning

Once the data has been collected together the cleaning, enhancement and analysis work can begin. Data cleaning is the process of normalizing and standardizing the data. For text data this can be done with linguistic rules to map disparate values onto standard values or to extract certain features from full text data, for later use as facets during search. For master data, such as names and addresses, the data should standardized using fault tolerant matching against available reference data to improve its correctness and completeness.

Data Analysis

Analysis work will usually start with producing standard statistics over the data, including count, minimum, maximum, average and standard deviation. This gives a first impression and feel for the data. More advanced analysis would involve looking at extracting patterns from the data based on feature analysis and finding correlations in the data. This will involve a combination of techniques ranging from principal component analysis to deep learning.

Some of the tools we support include:

  • Enrichment – Exorbyte Matchmaker
  • Deep Learning – TensorFlow
  • Text Learning – Mallet

