2021 HPCC Systems Community Virtual Summit: Data Cataloging with Tombolo

Просмотров: 77   |   Загружено: 4 год.
icon
HPCC Systems
icon
0
icon
Скачать
iconПодробнее о видео
Data Cataloging with Tombolo, Roger Dev & Jerry Jacob, LexisNexis Risk Solutions Group

It is easy for a Data Lake to grow out of control if appropriate measures are not put in place. When this happens, Data Engineer’s productivity can suffer, resulting in delays in customer commitments. A Data Lake can become a Data Swamp suddenly and without warning. The critical threshold is reached when the complexity of the Data Lake exceeds the capability of key personnel to hold the pattern of the Data Lake in their head. The goal of Tombolo, a Data Lake Curation tool, is to prevent such an event and allow the data lake to continue evolving rapidly as its complexity increases and as more personnel begin to participate. Tombolo provides the central operating environment for a Data Lake. The Tombolo Data Lake Curation System 1.0 is the first open-source Data Lake Curation system for the HPCC Systems Platform. It allows creation of documentation along with the data and analyses that provides a roadmap into all aspects (assets) of the Data Lake: Data Files, Data Providers and Consumers, Data Ingestion and Analytics, and User Queries. Its global find facility allows users to rapidly locate any asset or browse hierarchically to get the lay-of-the-land.

Copyright © 2021 LexisNexis Risk Solutions Group

Похожие видео

Добавлено: 56 год.
Добавил:
  © 2019-2021
  2021 HPCC Systems Community Virtual Summit: Data Cataloging with Tombolo - RusLar.Me