Indexing and Hashing in DBMS

Indexing and Hashing in DBMS

Overview

A database is a set of associated data. A DBMS, or Database Management System, simplifies the creation and management of data in databases. Users can create SQL queries to perform operations on database tables.

Multiple users can access and use data with DBMS. It also allows performing transactions and protects user data. Indexing and hashing are two important concepts of DBMS.

Indexing improves database efficiency by reducing the number of disk accesses required to process queries, whereas hashing determines the direct location of a data record on a disk without the use of an index structure.

In this article, we will be going to discuss more about Hashing and Indexing in a database management structure.

What is Indexing in DBMS?

Relational databases are databases that are stored as tables, or rows and columns. However, if we need to search for data on the disc, we need to perform a sequential search (linear search) by traversing many rows.

As a result, it is an extremely inefficient method of searching for desired data. Thus, by changing the method of storing data, we can make searching more efficient. This is where indexing comes in.

Indexing is a data structure technique that improves database performance by decreasing the number of disk accesses required when executing a query.

An index is a type of data structure. It is used to quickly identify and access data and information in a database table.

Indexing helps you to easily retrieve records from a database file. It is performed by creating an Index-table or Index.

An index is a small table with only two columns that are each a key-value pair. Copies of specific columns from the tabular data of the database are contained in the two columns of the index table (i.e., the key-value pair).

Example of Indexing

Indexing in a database management system is very much similar to indexing in a textbook. Consider the textbook "Learn Data Structures and Algorithms".

There will be numerous Data structures and Algorithms in this textbook, such as the array, stack, queue, linked list, DFS, BFS, knapsack, and so on. Assume we want to study the knapsack algorithm in Dynamic Programming.

If no indices are specified in the book, we will have to go through every page until we find our desired topic. However, if the indices are mentioned in the book, we can jump right to the page where the knapsack method is discussed.

Structure of Index

Let us first look at what happens in memory to better grasp indexing in DBMS. So, first and foremost, an index table is generated. This index table can have several columns but the main 2 attributes of this index table are "Search Key" and "Data Reference".

  • The search key is the first column in the database that contains a duplicate or copy of the table's candidate key or primary key. The primary key values are saved in sorted order so that the associated data can be accessed easily.

  • The second column in the database is the data reference. It comprises a collection of pointers that point to the disk block containing the value of a specific key.

So, now that we have a basic understanding of indexing in DBMS, let us look at the attributes of indexing in DBMS.

Attributes of Indexing in DBMS

The following are the attributes of indexing in a database management structure:

  • Access Types: It refers to the type of access, such as value-based access or range-based search, etc.

  • Access Time: It is the time taken to access (locate) a specific data or data element from the complete data set stored on the disc.

  • Insertion Time: When inserting data into a disc (memory), it is necessary to first identify an adequate empty space in which there is sufficient space to insert the data and then insert the data. So, insertion time is the time taken to identify an adequate empty space to insert the data.

  • Deletion Time: The deletion operation begins with first locating the data to be deleted. The data is then deleted, and the index structure is modified.

    So, deletion time is the time it takes to find the data (access time) plus the time it takes to delete the data and adjust the index structure.

  • Space Overhead: As previously discussed, an index table is formed using at least two attributes, which are the key and the block pointers.

    As a result, this index table will take up some space, which will be in the acquired memory (or disc) where the data is kept. This extra space required to hold the index table or index structure is referred to as space overhead.

What is Hashing in DBMS?

It will be very inefficient and hard to search all index values through all levels of a large database structure and then get to the desired data block to acquire the required data.

Hashing is a technique used for calculating the direct position of a data record on a disc without the usage of an index structure. In other words, it is a way of searching for data on a disc without using an index structure.

The hashing method is mostly used to index and retrieve data in a database since searching for a specific item using a shorter hashed key rather than the original value is faster.

Hash functions using search keys as parameters are used to generate the actual address of a data record. To generate the address, a hash function can use any of the column values.

The hash function generates the address of the data block using the primary key. A hash function is a simple mathematical function that can be used to solve any complex mathematical function.

The primary key can also be considered as the address of the data block. That means each row whose address is the same as a primary key is stored in the data block.

Need of Hashing in DBMS

Here, are a few DBMS situations where we need the Hashing technique:

  • It's difficult to search all of the index values through all of the levels of a large database structure, and then you have to go to the target data block to acquire the needed data.

  • The hashing technique is used to index and retrieve things in a database since it is faster to search for that specific item using the shorter hashed key rather than its original value.

  • Hashing is an efficient technique for determining the direct location of a data record on a disc without the use of an index structure. It is also a useful way for implementing dictionaries.

Attributes of Hashing in DBMS

The following are the attributes of hashing in a database management structure:

  • Data bucket (Unit of Storage): Data is stored in data blocks whose addresses are generated using the hashing function. These records are stored in memory in data buckets or data blocks.

  • Key: A key is an attribute or a group of attributes that help us uniquely identify a row (or tuple) in a table (or relation). A key is also used to develop relationships between the different columns and tables of a relational database. Individual values in a key are typically referred to as key values.

  • Hash function: A hash function is a simple mathematical function that can map an indeterminate size of data into a fixed size of data. Alternatively, it provides a numerical quantity that represents the input data.

  • Linear Probing: Linear probing uses a fixed interval between probes. Instead of overwriting the previous record, the next available data block is used to enter the new record.

  • Quadratic Probing: It helps you in determining the new bucket address. It allows you to add Intervals between Probes by adding the subsequent output of a quadratic polynomial to the initial value given by the original computation.

  • Hash Index: It is a data block address. A hash function can range from a simple mathematical function to a complex mathematical function.

  • Double Hashing: Double hashing is a computer programming technique that uses a secondary hash of the key as an offset when a collision occurs in hash tables in conjunction with open addressing to resolve hash collisions.

  • Bucket Overflow: Bucket skew occurs when some buckets are allotted more records than others, resulting in bucket overflow. Skew arises when multiple records have the same search key or when the hash algorithm used is not uniform. This is a fatal stage for any static hash function.

Conclusion

  • Indexing is a data structure technique used for quickly fetching entries from database files using some indexed attributes. Indexing in a database system is similar to indexing in books. Indexing is defined using indexing attributes.

  • Hashing uses mathematical methods known as hash functions to generate direct locations of data records on the disc, whereas indexing employs data references that comprise the address of the disc block together with the value corresponding to the key. As a result, there is a considerable difference between hashing and indexing.