Data Persistence - The Framework Blog

Hot

Post Top Ad

Monday, April 8, 2019

Data Persistence



DATA PERSISTENCE
       Main role of the information systems is data processing and convert into information. 

Data can be stored, read, modified and deleted. Data should be stored in non-volatile storage for persistence.There are two main ways of storing data. They are,
Files 
Databases

There are many formats for storing data. They are,
Plain-text, XML, JSON, tables, text files, images.

Data 

  • Data are row facts.
  • Can be converted and processed into meaningful information.
Database
  • A database is a collection of information organized in such a way that a computer program can quickly select required pieces of data.
  • Databases are created and managed in database servers.
Database Server
  • Database server is the term used to refer to the back-end system of a database application using client/server architecture.
  • Performs tasks such as data analysis, storage, data manipulation, archiving, and other non-user specific tasks.
Database Management System
  • A database management system is a collection of programs that enables to store, modify, and extract information from a database.
  • There are many different types of database management systems, ranging from small systems that run on personal computers to huge systems that run on mainframes.
      There are also differences in the expected level of service provided by file systems and databases. While databases must be self consistent at any instant in time, provide isolated transactions and durable writes, a file system provides much looser guarantees about consistency, isolation and durability. The database uses sophisticated algorithms and protocols to implement reliable storage on top of potentially unreliable file systems. It is these algorithms that make database storage more expensive in terms of processing and storage costs that make general file systems an attractive option for data that does not require the extra guarantees provided by a database.

Data arrangement
1. Structured
      For developers Structured data is very banal. It concerns all data which can be stored in database SQL  in table with rows and columns. They have relational key and can be easily mapped into pre-designed fields. Today, those data are the most processed in development and the simplest way to manage information.
     ex: spreadsheets and data from machine sensors.

2. Semi Structured
     Semi-structured data is information that doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze. With some process you can store them in relation database, but the semi structure exist to ease space, clarity or compute.
    ex: XML and JSON documents are semi structured documents,  NoSQL databases are considered as semi structured.

3. Unstructured
     Unstructured data is everywhere. In fact, most individuals and organizations conduct their lives around unstructured data. Unstructured data represent around 80% of data. While below sorts of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn’t fit neatly in a database.
    ex: include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents.


Types of Databases
1. Hierarchical databases

      
      Hierarchical structure is simple. It is inflexible due to the parent-child one-to-many relationship. Hierarchical databases are widely used to build high performance and availability applications usually in banking and telecommunications industries.
     Hierarchical database can be accessed and updated rapidly because the structure is like as a tree and the relationships between records are defined in advance.Hierarchical databases can adding a new field or record requires but the entire database be redefined. 

2. Network databases

     Network databases are mainly used on a large digital computers. They are hierarchical databases but unlike hierarchical databases where one node can have one parent only, a network node can have relationship with multiple entities. A network database looks more like a cobweb or interconnected network of records.

3. Relational databases

    Relational databases work on each table has a key field that uniquely indicates each row, and that key fields can be used to connect one table of data to another.Relational databases are the most popular and widely used databases. Some of the popular DBMSs are Oracle, SQL Server, MySQL, SQLite.




4. Object-oriented databases

    The object-oriented database derivation is the integrity of object-oriented programming language systems and consistent systems. Object-oriented databases use small, recyclable separated of software called objects. The objects themselves are stored in the object-oriented database.
    The benefits to object-oriented databases are compelling. The ability to mix and match reusable objects provides incredible multimedia capability.

5. Non-relational databases (NoSQL)
    

   NoSQL/Non-relational databases can take a variety of forms. NoSQL databases can be schema agnostic, allowing unstructured and semi-structured data to be stored and manipulated.

6. Graph databases
   Graph Databases are NoSQL databases and use a graph structure for queries. The data is stored in form of nodes, edges, and properties. In a graph database, a Node represent an entity or instance. A node is equivalent to a record in a relational database system. An Edge in a graph database represents a relationship that connects nodes. Properties are additional information added to the nodes.

7. Document databases


    Document databases are also NoSQL database that store data in form of documents. Each document represents the data, its relationship between other data elements, and attributes of data. Document database store data in a key value form.

Data Warehouse - Data Warehouse is mainly an architecture, not a technology. It extracting data from varieties SQL based data source and help for generating analytic reports. In terms of definition, data repository, which using for any analytic reports, has been generated from one process, which is nothing but the data warehouse.

Big Data - Big Data is mainly a technology, which stands on volume, velocity, and variety of the data. Volumes define the amount of data coming from different sources, velocity refers to the speed of data processing, and varieties refer to the number of types of data.

The application components communicate with files and databases The application components communicate with files and databases using API.




SQL statementsExecute standard SQL statements from the application.

Prepared statements - The query only needs to be prepared once, but can be executed multiple times with the same or different parameters. 

Callable statements - Execute stored procedures.
Object-relational mapping (ORM)
  • Technique for automatic conversion between object model and relational data model
  • work with objects, but they are stored in a traditional relational database.


Why Object Model?
     - It is natural for object oriented programming language.
     - Working with Object Data Model is straightforward, friendly and easy.



POJO vs JAVA BEANS

- It doesn’t have special restrictions other than those forced by Java language. It is a special POJO which have some restrictions.
- It doesn’t provide much control on members. It provides complete control on members.
- It can implement Serializable interface. It should implement serializable interface.
- Fields can be accessed by their names. Fields are accessed only by getters and setters.
- Fields can have any visiblity. Fields have only private visiblity.
- There can be a no-arg constructor. It must have a no-arg constructor.

JPA
- Java EE standard for ORM (inspired with Hibernate)
- Entity is lightweight POJO which can be freely passed between components, locally or remotely.

Object Relational Mapping (ORM) Tools

  • Java - Apache, Hibernate, JDO, QuickDB
  • .Net - Nhibernate, Entity Spaces, OpenAccess Telerik
  • PHP - CakePHP, Qcodo


NoSQL

Large volumes of rapidly changing structured, semi-structured, and unstructured data.
- Agile sprints, quick schema iteration, and frequent code pushes.
- Object-oriented programming that is easy to use and flexible.
- Geographically distributed scale-out architecture instead of expensive, monolithic architecture.

Types of DBs

1. Key-value stores

      Key-value stores, or key-value databases, implement a simple data model that pairs a unique key with an associated value. This model can lead to the development of key-value databases, which are highly scalable for session management and caching in web applications. Examples include MemchacheDB and Redis.

2. Document databases
      Document databases are also NoSQL database that store data in form of documents. Each document represents the data, its relationship between other data elements, and attributes of data. Document database store data in a key value form.

3. Wide-column stores
      Wide-column stores organize data tables as columns instead of as rows. It can be found both in SQL and NoSQL databases. Wide-column stores can query large data volumes faster than conventional relational databases. It can be used for recommendation engines, catalogs, fraud detection and other types of data processing.  Google BigTable, Cassandra and HBase are examples of wide-column stores.

4. Graph stores
      Graph Databases are NoSQL databases and use a graph structure for queries. The data is stored in form of nodes, edges, and properties. In a graph database, a Node represent an entity or instance. A node is equivalent to a record in a relational database system. An Edge in a graph database represents a relationship that connects nodes. Properties are additional information added to the nodes.


Hadoop

- The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

- It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. 

Hadoop core concepts

- Hadoop Distributed File System : A distributed file system that provides high-throughput access to application data.
- Hadoop YARN : A framework for job scheduling and cluster resource management.
- Hadoop Map Reduce : A YARN-based system for parallel processing of large data sets.

INFORMATION RETRIEVAL (IR)





TOOLS FOR INFORMATION RETRIEVAL (IR)
  • NLTK
  • Gensim
  • Matplotlib
  • Pandas



REFERENCES
[1] https://www.webopedia.com/TERM/D/database_management_system_DBMS.html
[2] https://www.quora.com/What-is-the-difference-between-a-file-system-and-a-database/answer/Christian-Smith-2
[3] https://www.c-sharpcorner.com/UploadFile/65fc13/types-of-database-management-systems/
[4] https://www.alooma.com/blog/types-of-modern-databases
[5] https://www.educba.com/big-data-vs-data-warehouse/
[6] https://searchdatamanagement.techtarget.com/definition/NoSQL-Not-Only-SQL

No comments:

Post a Comment

Post Top Ad