Data Management Research Paper Topic Definition
Data management forms the foundation for most of the organizations. Data refers to a piece of raw, unorganized and unprocessed information, for example, a person’s name, school name. In Information technology, data management is performed by database systems which store huge amount of organized data in such a way that data can easily be accessed, accessed and managed. The database is a location of related information and operations performed on them. Data and information are two very different things. Information refers to processed data (Hoffer, Ramesh & Topi, 2013). An example to distinguish the two concepts is as follows: When the clerk inputs a customer’s details into the computer, the details become the raw data. But when the client’s details are processed, structured, organized or represented in a different context such as a report, to render it useful, then it is information.
In computing, data is managed using database management system (DBMS), a system that facilitates storage and retrieval of user’s data with high efficiency and in high-security measures. The users of the DBMA include the end users who are the customers of the business and can range from simple viewers to sophisticated users such as the analysts. The System designers are the professionals who design parts of the database by identifying and deciding on data format, entities, constraints, views and relations. The administrators, on the other hand, manage the database by creating access profiles, managing DDMS resources like licenses, software, and hardware. The development of DBMS has been associated with so many advantages as far as data is concerned. The system renders the paperwork activities invalid by reducing the cost of resources and spacing as well as providing a permanent storage for many years (Hoffer, Ramesh & Topi, 2013). Most of DBMS work is automated, and so organizations do not incur too much cost of file retrieval and information processing.
The modern data management system is improved and focuses on the most important issues related to data. For the older systems, file formats were used in managing data. The modern DBMS gets based on real world entities by using behavior/methods and attributed. For example, a person’s database is defined by actions such as walk, jump and run, and characteristics such as age. The system architecture built in such a way that it isolates data and applications, thus enabling the system to store metadata to make its process easy. The system also has advanced features supporting concurrency and multiple user access during transactions and thus allowing users to access the data and manipulate it in parallel (Garcia-Molina, Ullman & Widom, 2014). The system undertakes all these abilities without the users’ knowledge. The normalization of data in DBMS systems is yet another important feature for minimizing data redundancy.
This research paper provides some of the aspects of the data management by the DBMS. The paper also introduces Standard Query Language (SQL), used in data retrieval and manipulation. The query language gets improved in that; it allows different options for filtering data during recovery processes, unlike the traditional file processing systems.
The Database Management System (DBMS) is responsible for all data operations, as discussed earlier. The DBMS design is highly dependent on its underlying architecture- which can be single or multi-tier. The architecture design can be decentralized, centralized or hierarchical. Single-tier architecture defines the DBMS as a single entity that consists of a user who directly uses the DBMS. Database designers and programmers prefer this structure. The two-tier architecture adds an application layer that the users such as programmers use to access the DBMS. Three-tier architecture is widely used for DBMS design since it consists of three tires namely; presentation, application and database tier (Garcia-Molina, Ullman & Widom, 2014). An advantage of multiple tiers is that it gets highly flexible to modify because each tier is independent of another.
The DBMS uses table structures rather than the older versions of file system structures. In DBMS, the tables are formed from entities and relations and therefore by looking at table names, it enables a user to understand database architecture. The notion of the entities and logical associations (relationships) among them brings about the Entity-Relationship(ER), Model. Entities can have several relationships that define them, and this brings about mapping cardinalities. The database table model that gets commonly used is the relational model, a model that gets based on first order predicate logic and for this model to have satisfactory results, normalization is essential (Garcia-Molina, Ullman & Widom, 2014). Normalization involves elimination of dependencies in the attributes of relations.
The Standard Query Language is the widely used language for the relational database that gets based on relational algebra and calculus. SQL consists of data definition and manipulation languages. Data Definition Language (DDL) is used to design and modify database schema while Data Manipulation Language (DML) gets used for storing and retrieving data from the database (Celko, 2005). SQL also provides for the relational algebra operations such as the select, union, project, Cartesian product, set difference and rename. Using SQL, the programmer is also able to join operations on many relations. It is, however, important that the database schemas and relations to be properly designed to minimize data redundancy and inefficiency of databases.
The primary applications of database systems are transactions. In transactions, there exist several people who access the database from the presentation tier to perform operations such as alter, delete or select. This topic is very broad but very essential in database management systems. However, I will provide some brief explanations on some of the important aspects. An advantage of DBMS, as discussed earlier is its concurrency control ability. Transactions get characterized by the ACID (Atomicity, Concurrency, Integrity, and Durability) properties. The system allows the users to access data from the database and performs a transaction, in parallel without conflicts or deadlock situations (Hoffer, Ramesh & Topi, 2013). For this to happen, a transaction schedule is created, and the user can perform read/write operations and the changes made are effective only when he/she commits. An example of such a scenario is the banking transactions where several users are performing read/write operations on the same account, at the same time. The DBMS should ensure that the users access the accounts without interference or deadlock occurrence on the system.
The SQL as used for creating, modifying, retrieving and manipulating data from RDBMS
The SQL, expanded as Structured Query Language is the computer language that is most popular in the creation, modification, retrieval and manipulation of data from the relational database management systems (Egenhofer, 1994). SQL has gone through a genesis to revelation as it is now being used for functions that are different from its original purpose that was to support the object-relational DBMSs. SQL is an ANSI/ISO standard. The language is designed to use for a specific purpose that is, querying the data in storage in the relational database management systems. Thus, it is a set-based and a declarative computer language unlike the other programming languages such as C or PASCAL that are designed to address a broader range of problems.
The data retrieval operation is the most frequently used operation in the transactional databases. When the SQL language is restricted to the data retrieval commands, it is used as a functional language (Egenhofer, 1992). It uses the SELECT statement to retrieve one or more rows from a single or several the database tables. The SELECT is the DML command that ahs most common usage in many applications. When the user specifies the SELECT query, he/she specifies the required results of the query although there is no specification of the physical operations that should be executed to achieve that output. It is the role of the database to translate that query to produce an efficient query plan that can, in turn, produce the desired results. The query optimizer is the one responsible for accomplishing that operation above. The SELECT Statement does not work alone, but it is being accompanied with several keywords that include FROM, WHERE, GROUP BY, ORDER BY, and HAVING among others.
The FROM keyword is used to identify the rows that are to be retrieved, and it is applied before the GROUP BY keyword. The latter is useful in combining the rows with the associated values to form elements or a smaller group of rows. The HAVING keyword is also useful for the identification of the combined rows that are to be retrieved. It acts in a similar manner to the WHERE keyword although it operates on the output of the GROUP BY and thus it can utilize the aggregate functions. The ORDER BY then identifies the columns that are to have usage in sorting the resulting data. When any one of the keywords is declared, the DBMS automatically performs the specified action since it understands these statements systems (Egenhofer, 1994). For that reason, this keyword should not be used in a program unless they are put in parenthesis to help the DBM from translating them as keywords.
Before understanding about the data manipulation as executed by the SQL language, it is essential to have knowledge about the Data Manipulation Language (Leiter, 2009). It was discussed earlier in this research paper. The DML elements such as add, update and delete are the subsets of the keywords used by the SQL to query the database so that it can carry out data manipulation on the database. For instance, the INSERT keyword is used to add rows to an existing table of a database. There is also the UPDATE keyword that is useful in modifying the values a group of existing rows in a database table. The MERGE is also another data manipulation keyword that is used to combine the data from multiple tables. The MERGE is the combination of the INSERT and the UPDATE elements. Before the element was defined in the SQL: 2000 standard, databases used to provide similar functionality through various syntax, also called”upset.” The TRUNCATE is also another data manipulation element used for deleting all the data in a table. The DELETE command, on the other hand, removes none or more rows from a database table.
If the data transaction is available, it is used to wrap around the Data Manipulation Language operations. An example of these is the START TRANSACTION that is used to signify the beginning of a database transaction, and it completely entirely or not at all. Additionally, the COMMIT element is used to make changes in a given transaction permanent. The ROLLBACK keyword can, however, be used to discard all the data changes of the last ROLLBACK or COMMIT transactions so that the data can be reverted to the original state in which it was before the transaction (Date & Darwen, 1997). Both the COMMIT and ROLLBACK elements interact with the areas like locking and transaction control. They both terminate any transaction in progress and release the locks held by the data if any. When the START transaction is absent, the semantics of the SQL depend on the implementation.
The other set of keywords is the DDL, expanded to Data Definition Language. DDL allows the database manager or the programmer to define the new tables as well as the related elements. Most commercial database management systems based on SQL contain proprietary extensions that allow them to have control over then other features of the database management system (Egenhofer, 1994). The basic items used for data definition include the CREATE and DROP commands. The CREATE command is used to cause the creation of an object, such as a table, within the database. On the other hand, the DROP command causes an existing database object to be eliminated irretrievably. There are also other database management systems that contain the ALTER command that is used to permit the modification of the existing objects in different ways; for instance, adding one or more columns to an existing database table.
The data control is the third group of SQL keywords that are useful to the data control language. The data Control Language (DCL) has the function of handling the data authorization aspects, and it also allows the database user or manager to control the people having access to the data within the database (Leiter, 2009). It also includes two main keywords that include GRANT and REVOKE. The GRANT keyword is used to authorize a user to carry out some operation or multiple operations on a database object. On the other hand, the REVOKE statement is used to remove or limit the capability of a given user to carry out an operation or a set of operations on some object of the database.
Data is crucial to any organization. As discussed above, from my view, the best way to manage data is by using the database management systems, which are easy and the most efficient applications to use because the most important thing is that information gets organized in the most efficient manner, to allow easy retrieval and altering. Also, the DBMS structure is built in such a way that the logical schema is independent of the physical database schema, therefore allowing the system to be easily expanded to suit user’s needs.
Celko, J. (2005). Joe Celko’s SQL for smarties: Advanced SQL programming. Amsterdam: Morgan Kaufmann.
Garcia-Molina, H., Ullman, J. D., & Widom, J. (2014). Database systems: The complete book.
Hoffer, J. A., Ramesh, V., & Topi, H. (2013). Modern database management. Boston: Pearson.