![]() |
![]() |
|
LPA Data Mining Toolkit
The LPA Data Mining Toolkit is a collection of routines, supplied in the form of an API, which support the discovery of rules and patterns within relational databases such as Access, Oracle, SQL Server etc.
Three Phases of Discovery
The key stages of data mining as supported by the LPA Data Mining toolkit are:
- Selection of Data Source: The first stage is to select the appropriate data for analysis. The toolkit assumes that all joins and exclusions and views are actioned outside of the toolkit. The toolkit works off a single table. You need to designate which columns are to be included in the search and discovery stage.
- Constructing a Target: This can be a simple formula like 'customers who renewed maintenance' or a compound formula like 'people with salary greater than 25K and owner_occupiers' which represents what you are interested in. This formula, or target expression, is used to drive the data mining investigation.
- Discovering Patterns: The main stage of the data mining process, and is where the patterns in the data are sought. All other values in all designated columns are analysed to see which values and value ranges contribute towards the target more than might normally be expected.
What is in the LPA Data Mining toolkit?
The LPA Data Mining toolkit contains:
- API Routines: A collection of routines for dealing the three phases of data mining as described above. These routines are presented as Prolog predicates and can be combined with most all other LPA products and features. The routines often return lists and structures which can be manipulated easily by the Prolog developer. This makes the toolkit an ideal basis for Prolog application developers to build their own data mining applications or to include a data mining component within their existing applications.
- Source Code Example: A fully documented source code example is supplied which shows you how to build an interactive data mining oriented application using the API routines and a set of dialogs designed using the Dialog Editor utility which comes with LPA Prolog.
- Sample Data Mining Application: A stand-alone, point-and-click desktop application, based on the source code example described above is supplied which you can use 'out-of-the-box' to demonstrate and explore the data mining concepts described here.
Run-time Deployment and Application Deployment
The LPA Data Mining toolkit can be integrated with most all other LPA products and technology. By combining the LPA Data Mining toolkit and the Intelligence Server, it is possible to present the data mining toolkit as a COM object for embedding within, say, a VB-oriented application. By combining with ProWeb, it is possible to develop a web-based data mining application.
How does the LPA Data Mining toolkit work?
For any given target, the LPA Data Mining toolkit will count each row to determine how important and how much influence each column exerts on the target. The result is an ordered list of elementary conditions which are deemed to be influential. The LPA Data Mining toolkit then lets you explore how well these atomic conditions combine in terms of producing candidate rules.
What is a Candidate Rule?
Results are generated in the form of IF-THEN rules, several of which might be formed about the same target statement.
For example:
IF "PurposeOfLoan" = "NewCar" AND "StatusSex" = "SingleMale" THEN "LoanApproved" = 1
Associated with each candidate rule are statistics about truth, sometimes referred to accuracy, and coverage and significance.
Truth% = 33.33 Hit% = 15.33 Base% = 13.40 Significance% = 14.43 Entropy = -3.18
Performance
The LPA Data Mining toolkit generates large volumes of SQL queries to analyse the database. By utilising the performance of the database engine, the LPA Data Mining toolkit offers a truly scaleable and robust architecture.
What is Required?
The LPA Data Mining toolkit uses ODBC and SQL to query databases. You need to ensure that you have the correct ODBC drivers installed and that you have set up your data files as data sources.
Integration with WIN-PROLOG and its Toolkits
WIN-PROLOG is the central product in a series that consists of programming tools that works cross-platform on Windows XP, 2000, NT, ME, 98 and 95; the series also includes flex, Flint, the CBR toolkit and the ProData Database Interface toolkit. The Windows series uses incremental compilation of user programs to provide the execution speed of a compiler but with the interactive behaviour of an interpreter. This allows for the in-line debugging and editing of programs.