- Microsoft 365 Solutions
- Content Insights
- Content Migrations
- Web Migrations
- Customer Cases
How to import data into Hadoop Distributed File System?
The basis of Xillio’s Hadoop import connector is to have one unified content model, in which we store data in between the migration process. Besides the content model and its supporting database (MongoDB), Xillio uses HDFS REST interface for this connector.
The imported data’s meta information resides in the content model, while the documents are stored locally on the file system. Xillio first configures the extracted document’s metadata to conform to Hadoop's format, after which the import is done automatically and flawlessly.
Note: Structured data can be stored in database SQL in a table with rows and columns. Semi-structured data is information that doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze. With some process, you can store them in relation database (e.g. XML). Unstructured data often include text and multimedia content. Examples are e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents.
Hadoop (HDFS) is specifically designed to store large quantities of data in order for it to stream this at high speed to user applications. It is suitable for big data processing and thus gained popularity over the last few years.
Thanks to Xillio’s connector, importing data, content, or documents from any source system into Hadoop has never been easier. Using Xillio's import connector, data will be imported from our unified data model into HDFS in a uniform manner without the loss of quality. Our UDM is a database independent data model that can read and interpret any unstructured data.
Watch the demo video to see an example of an import of data into Hadoop HDFS.
Example of an data import into Hadoop using Xillio
Fill out the form and we will contact you as soon as possible