DESIGN AND IMPLEMENTATION OF A COMPUTERISED LIBRARY STOCK MATCHING SYSTEM

 

1.0 INTRODUCTION TO CHAPTER ONE

An analyst can use the data matching procedure to decrease data duplication and improve data sources. Matching examines the degree of duplication in all records from a single data source, yielding weighted probability of a match for each pair of records compared. You can then pick which records match and take the necessary action in the source data. Andes, De (1993). Data matching has various advantages, including the following: It allows for the elimination of disparities between data values that should be equal, determining the proper values and decreasing errors caused by data differences. Names and addresses, for example, are frequently used to identify individuals over time. Matching to discover and repair these problems can help with use and maintenance (Winkler 1993). Data matching also allows for the uniformization of the names of books in the library that are equivalent but were entered in a different style of format. It is also vital to understand data matching and merging entries from many databases that relate to the same entities. Most of the time, the entities under consideration are people, such as patients, consumers, tax payers, or travelers, but for the time being, this research will focus on data matching in a library setting. This study comprises the full or partial integration of two or more data sets based on information shared. It allows data gathered from several sources to be utilised more efficiently, increasing the value of the original sources. Data matching can also lessen the potential strain on data by eliminating the requirement for additional data collecting. Where data matching includes the integration of records for the same units, though. The research’s findings will raise serious concerns regarding confidentiality and security. J.R. Copas and F.J. Hilton (1990).

1.1 RESEARCH PURPOSE

The goal of this project is to create and develop a computerized matching record for the university library. Attempts will be taken to attain absolute trust in the accuracy, completeness, robustness, and consistency of these identifiers over time while developing a data match for the school library, because any flaw in such an identifier will result in incorrectly matched records.

1.2 RESEARCH OBJECTIVE

1. A common entity identifier will be utilized in the database to match these attributes that contain partially identified information, such as the name of the publisher, the location of publication, and the dates of publication. The author’s name and brief bio could also be used. Winkler (in 1986 and 1987).

2. Rather than developing a special survey to collect data for policy decisions, data from available books sources will be matched, which has potential advantages because it contains a greater amount of data and their data may be more accurate due to improvement over time. Swain and colleagues (1992).

1.3 THE SCOPE OF THE RESEARCH

The research outlines how all persons involved in the production of data matching for the university library will meet their duty to maintain the confidentiality of data in their care while also enhancing the value of those data through data matching, where appropriate.W.S. Coper and M.E. Maron (1987).

1.4 RESEARCH LIMITATIONS

There will be numerous restrictions encountered during and after this research activity. Among these difficulties are:

1. Absence of a unique entity identifier and poor data quality.

2. The computational difficulty.

3. A scarcity of training data that includes the genuine match status.

4. LACK OF UNIQUE IDENTIFIER PRIVACY

In most cases, the databases to be matched/de-duplicated lack unique entry identifiers or keys.Even though entity IDs are available in databases to be matched, one must be totally sure in their quality, completeness, robustness, and consistency across time, because any error in such identifiers will result in an incorrectly matched record. Finally, if no entity identifiers are available in the databases to be compared, the matching must rely on attributes shared by the databases. Decurre.Y(1998).

COMPLEXITY OF COMPUTATION

When matching the databases, each record from one database must potentially be compared to all records in the other database to determine whether a pair of records relate to the same entity or not. As the number of databases to be matched grows, the computation complexity of data matching grows quadratically.

MISSING TRAINING DATA CONTAINING THE ACTUAL MATCH STATUS

The true status of two records that are matched across two databases is unknown in many data matching systems, which means that there is no ground truth or gold in the standard data accessible that indicates whether two records correspond to the same entity or not. Without further information, it is impossible to be certain that the results of a data matching effort are valid. W.E. Deming and G.J. Glesser (1959).

PRIVACY AND CONFIDENCE

As previously stated, privacy and secrecy must be carefully examined when using data matching, which frequently relies on personal information such as names, addresses, and dates. The examination of matched data has the potential to reveal qualities of persons or groups of organizations that are not apparent when analyzing a single database alone. S.J. Harberman (1975).

1.5.0 THE RESEARCH’S JUSTIFICATION

One of the most important reasons why the research is necessary and appropriate is that it allows users to minimize disparities between data values that should be the same, finding the proper values, and eliminating errors caused by data differences. Another justification for this topic is that it ensures that values that are equivalent but entered in a different format or style are rendered uniform. Hill,T.(1991).

Furthermore, duplicate records in a database where different identifiers are used for the same thing will be avoided (Fellegi 1999). Finally, data matching detects precise and approximate matches, allowing the user or administrator to eliminate duplicate data as it is defined.

1.6.0 TERMS RELATED TO DATA MATCHING

1. Key: a set of data fields that serves as the basis for comparison in a data matching application.

2. Matched Results: a collection of matched records generated by a data matching application.

3. Matched recordings: A combination of two or more recordings.

4. Name inconsistencies: When the same individual is recorded by several agencies with conflicting identifying details.

5. Name Tokens: A part of a complete or raw name, such as a family name, first given name, or title.

6. Name Type: Describes the type of a person’s present or prior name, such as legal, maiden name, or alais.

7. Non-matched records: Records for which the data matching application was unable to locate a corresponding record in one or more other data files. N/B: This is not to say that a record for the individual does not exist elsewhere; it simply means that the program was unable to locate one.

8. Profile groupings: In the interpretation of identification data matching results, the assignment of matched records to specific groups based on how the matching records were collected. Used to better allocate resources to following results processing.

9. Unicode standard: A 1-4-byte character code that defines each character. In the majority of the world’s languages.

10. Data matching: The gathering and comparison of data from various sources.

11. Data topology: The order relationship of specific data items to other data items.

Address components are the separate component elements/fields of an address string, such as the street number, street name, street type, and town/suburb.

13. Algorithm: A set of logic rules created during the data matching application’s design phase. The ‘blueprint’ used to convert logic principles into computer instructions detailing which steps to perform in what order.

14. Application: The ultimate software and hardware combination that performs data matching.

15. Control group: A set of records of a known type (e.g., previously discovered fraudulent identities, decreasing persons) used to better evaluate data matching results.

16. Cross Agency: When data from one agency is matched with data from one or more other agencies.

A structured collection of records or data maintained in a computer system is referred to as a data matching database.

18. Data cleansing: The proactive detection and remediation of data quality issues that impair an agency’s ability to use its data efficiently.

19. Data integrity: The quality of correctness, completeness, and conformity with the intent of the data’s authors, i.e. ‘fit for purpose’.

20. Enrollment: The process by which an individual enrolls in an agency. Involves the first gathering of identifying information.

Leave a Comment