language en

The AI Method Evaluation Ontology

Latest version:
https://www.w3id.org/iSeeOnto/aimethodevaluation
Contributors:
Anjana Wijekoon
Chamath Palihawadana
David Corsar
Ikechukwu Nkisi-Orji
Juan A. Recio-Garcia
Marta Caro Martínez
Imported Ontologies:
explanationPattern.owl
sio.owl
cpannotationschema.owl
prov-o#
aimodel
eo
Download serialization:
RDF/XML
License:
http://insertlicenseURIhere.org
License:
http://insertlicenseURIhere.org
Visualization:
Visualize with WebVowl
Cite as:
The AI Method Evaluation Ontology.

Ontology Specification Draft

Abstract

The AI Method Evaluation Ontology is an ontology that models the assessments, such as accuracy, F1 score, etc. of an AI Method. An assessment activity is performed to evaluate the performance of a AI Method, using a metric which defines how the assessment should be performed. The outcome of the assessment is captured in the Result concept. The metric measuress some aspect of the AI Method, such as accuracy, precision, recall, completeness, etc. The assessment is performed by an Agent, either software or human. The Prov properties startedAtTime and endedAtTime are used to record when the assessment took place. This pattern is based on that of Qual-O defined in C. Baillie, P. Edwards, and E. Pignotti, 2015. QUAL: A Provenance-Aware Quality Model. J. Data and Information Quality 5, 3, Article 12 (February 2015) DOI:https://doi.org/10.1145/2700413. This ontology was created as part of the iSee project (https://isee4xai.com) which received funding from EPSRC under the grant number EP/V061755/1. iSee is part of the CHIST-ERA pathfinder programme for European coordinated research on future and emerging information and communication technologies.

Introduction back to ToC

Namespace declarations

Table 1: Namespaces used in the document
aieval<https://www.w3id.org/iSeeOnto/aimethodevaluation>
schema<http://schema.org>
owl<http://www.w3.org/2002/07/owl>
Fowlkes<http://www.w3id.org/iSeeOnto/aimodelevaluationFowlkes–>
xsd<http://www.w3.org/2001/XMLSchema>
skos<http://www.w3.org/2004/02/skos/core>
rdfs<http://www.w3.org/2000/01/rdf-schema>
cito<http://purl.org/spar/cito>
prov-o<http://www.w3.org/TR/prov-o>
terms<http://purl.org/dc/terms>
xml<http://www.w3.org/XML/1998/namespace>
vann<http://purl.org/vocab/vann>
Youden<http://www.w3id.org/iSeeOnto/aimodelevaluationYouden'>
aimodel<http://www.w3id.org/iSeeOnto/aimodel>
prov<http://www.w3.org/ns/prov>
foaf<http://xmlns.com/foaf/0.1>
void<http://rdfs.org/ns/void>
resource<http://semanticscience.org/resource>
Qual-O<http://sensornet.abdn.ac.uk/onts/Qual-O>
protege<http://protege.stanford.edu/plugins/owl/protege>
cpannotationschema<http://www.ontologydesignpatterns.org/schemas/cpannotationschema.owl>
eo<https://purl.org/heals/eo>
core<http://purl.org/vocab/frbr/core>
rdf<http://www.w3.org/1999/02/22-rdf-syntax-ns>
aieval<http://www.w3id.org/iSeeOnto/aimodelevaluation>
obo<http://purl.obolibrary.org/obo>
dc<http://purl.org/dc/elements/1.1>

The AI Method Evaluation Ontology: Overview back to ToC

This ontology has the following classes and properties.

Classes

Named Individuals

The AI Method Evaluation Ontology: Description back to ToC

Outline of the AI Method Evaluation ontology main classes and relationships
Outline of the AI Method Evaluation ontology main classes and relationships. Concepts highlighted in blue are defined in this ontology.

Cross reference for The AI Method Evaluation Ontology classes, properties and dataproperties back to ToC

This section provides details for each class and property defined by The AI Method Evaluation Ontology.

Classes

AI Model Assessmentc back to ToC or Class ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#AIModelAssessment

The activity performed that made an assessment of a AI Model, guided by a metric, to generate a Result. The assessment can be associated with the agent (e.g. User) that performed the assessment.
has super-classes
assessment c

AI Model Assessment Dimensionc back to ToC or Class ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#AIModelAssessmentDimension

The dimension, such as Accuracy, Prevision, Recall, etc. that an evaluation assessed.
has super-classes
dimension c
has members
Data Quality ni, Network Usage ni, Performance ni, Robustness ni, Speed ni, Stability ni

AI Model Assessment Metricc back to ToC or Class ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#AIModelAssessmentMetric

The criteria used to guide the assessment of an AI Model and determine the result.
has super-classes
metric c
has members
AU-ROC ni, Accuracy ni, Adjusted Rand Index ni, BLEU ni, Brier Score ni, Calinski-Harabasz Index ni, Cohen's Kappa Coefficient ni, Coverage ni, Davies-Bouldin Index ni, Dice Index ni, Discounted cumulative gain ni, Diversity ni, Dunn Index ni, F1-score (macro) ni, F1-score (micro) ni, Fowlkes–Mallows index ni, Hamming Loss ni, Hopkins statistic ni, Inference Speed ni, Jaccard Score ni, METEOR ni, Mathews Correlation Coefficient ni, Mean Absolute Error ni, Mean Squared Error ni, Mutual Information ni, NIST ni, Perplexity ni, Precision ni, Purity ni, R squared ni, ROUGE ni, Rand Index ni, Recall ni, Recommender persistence ni, Root Mean Squared Error ni, Serendipity ni, Silhouette Score ni, Training Speed ni, True Negative Rate ni, WER ni, Youden's J statistic ni

AI Model Assessment Resultc back to ToC or Class ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#AIModelAssessmentResult

The result of assessing a specified dimension of an AI Model, as described by the metric specification.
has super-classes
result c

Named Individuals

Accuracyni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Accuracy

Accuracy is how close a given set of measurements (observations or readings) are to their true value.
belongs to
AI Model Assessment Metric c

Adjusted Rand Indexni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Adjusted_Rand_Index

The adjusted Rand index is the corrected-for-chance version of the Rand index.
Source
https://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index
belongs to
AI Model Assessment Metric c

AU-ROCni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#AU-ROC

A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.
Source
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
belongs to
AI Model Assessment Metric c

BLEUni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#BLEU

BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another.
Source
https://en.wikipedia.org/wiki/BLEU
belongs to
AI Model Assessment Metric c

Brier Scoreni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Brier_Score

The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions. For unidimensional predictions, it is strictly equivalent to the mean squared error as applied to predicted probabilities.
Source
https://en.wikipedia.org/wiki/Brier_score
belongs to
AI Model Assessment Metric c

Calinski-Harabasz Indexni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Calinski-Harabasz_Index

The index is the ratio of the sum of between-clusters dispersion and of within-cluster dispersion for all clusters (where dispersion is defined as the sum of distances squared).
Source
https://scikit-learn.org/stable/modules/clustering.html#calinski-harabasz-index
belongs to
AI Model Assessment Metric c

Cohen's Kappa Coefficientni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Cohens_Kappa_Coefficient

Cohen's kappa coefficient (κ, lowercase Greek kappa) is a statistic that is used to measure inter-rater reliability (and also intra-rater reliability) for qualitative (categorical) items.
Source
https://en.wikipedia.org/wiki/Cohen%27s_kappa
belongs to
AI Model Assessment Metric c

Coverageni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Coverage

belongs to
AI Model Assessment Metric c

Data Qualityni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Data_Quality

belongs to
AI Model Assessment Dimension c

Davies-Bouldin Indexni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Davies-Bouldin_Index

The Davies–Bouldin index (DBI), introduced by David L. Davies and Donald W. Bouldin in 1979, is a metric for evaluating clustering algorithms. This is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset.
Source
https://en.wikipedia.org/wiki/Davies%E2%80%93Bouldin_index
belongs to
AI Model Assessment Metric c

Dice Indexni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Dice_Index

The Sørensen–Dice coefficient is a statistic used to gauge the similarity of two samples.
Source
https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
belongs to
AI Model Assessment Metric c

Discounted cumulative gainni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Discounted_cumulative_gain

Discounted cumulative gain (DCG) is a measure of ranking quality.
belongs to
AI Model Assessment Metric c

Diversityni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Diversity

Source
https://en.wikipedia.org/wiki/Recommender_system#Performance_measures
belongs to
AI Model Assessment Metric c

Dunn Indexni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Dunn_Index

The Dunn index (DI) (introduced by J. C. Dunn in 1974) is a metric for evaluating clustering algorithms.
Source
https://en.wikipedia.org/wiki/Dunn_index
belongs to
AI Model Assessment Metric c

F1-score (macro)ni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#F1-score_(macro)

F-score or F-measure (macro) is a measure of a test's accuracy calculated from macro-averaging (taking all classes as equally important) the precision and recall of the test.
Source
https://en.wikipedia.org/wiki/F-score
belongs to
AI Model Assessment Metric c

F1-score (micro)ni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#F1-score_(micro)

F-score or F-measure (micro) is a measure of a test's accuracy calculated from micro-averaging (biased by class frequency) the precision and recall of the test.
Source
https://en.wikipedia.org/wiki/F-score
belongs to
AI Model Assessment Metric c

Fowlkes–Mallows indexni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Fowlkes–Mallows_index

The Fowlkes–Mallows index is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm), and also a metric to measure confusion matrices.
Source
https://en.wikipedia.org/wiki/Fowlkes%E2%80%93Mallows_index
belongs to
AI Model Assessment Metric c

Hamming Lossni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Hamming_Loss

Source
https://en.wikipedia.org/wiki/Hamming_distance
belongs to
AI Model Assessment Metric c

Hopkins statisticni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Hopkins_statistic

The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set.
Source
https://en.wikipedia.org/wiki/Hopkins_statistic
belongs to
AI Model Assessment Metric c

Inference Speedni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Inference_Speed

belongs to
AI Model Assessment Metric c

Jaccard Scoreni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Jaccard_Score

The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets.
Source
https://en.wikipedia.org/wiki/Jaccard_index
belongs to
AI Model Assessment Metric c

Mathews Correlation Coefficientni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Mathews_Correlation_Coefficient

Matthews correlation coefficient (MCC) is used as a measure of the quality of binary (two-class) classifications.
Source
https://en.wikipedia.org/wiki/Phi_coefficient
belongs to
AI Model Assessment Metric c

Mean Absolute Errorni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Mean_Absolute_Error

Mean absolute error (MAE) is a measure of errors between paired observations expressing the same phenomenon as the sum of absolute errors divided by the sample size.
Source
https://en.wikipedia.org/wiki/Mean_absolute_error
belongs to
AI Model Assessment Metric c

Mean Squared Errorni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Mean_Squared_Error

Mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors - that is, the average squared difference between the estimated values and the actual value.
Source
https://en.wikipedia.org/wiki/Mean_squared_error
belongs to
AI Model Assessment Metric c

METEORni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#METEOR

METEOR (Metric for Evaluation of Translation with Explicit ORdering) is a metric for the evaluation of machine translation output based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision.
Source
https://en.wikipedia.org/wiki/METEOR
belongs to
AI Model Assessment Metric c

Mutual Informationni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Mutual_Information

The mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables.
Source
https://en.wikipedia.org/wiki/Mutual_information
belongs to
AI Model Assessment Metric c

Network Usageni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Network_Usage

belongs to
AI Model Assessment Dimension c

NISTni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#NIST

NIST is a method based on the BLEU metric for evaluating the quality of text which has been translated using machine translation.
Source
https://en.wikipedia.org/wiki/NIST_(metric)
belongs to
AI Model Assessment Metric c

Performanceni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#PredictivePerformance

belongs to
AI Model Assessment Dimension c

Perplexityni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Perplexity

Perplexity is a measurement of how well a probability distribution or probability model predicts a sample.
Source
https://en.wikipedia.org/wiki/Perplexity
belongs to
AI Model Assessment Metric c

Precisionni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Precision

Precision (positive predictive value) is the fraction of relevant instances among the retrieved instances.
Source
https://en.wikipedia.org/wiki/Precision_and_recall
belongs to
AI Model Assessment Metric c

Purityni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Purity

Purity is a measure of the extent to which clusters contain a single class.
Source
https://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_and_assessment
belongs to
AI Model Assessment Metric c

R squaredni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#R_squared

R2 (coefficient of determination) is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
Source
https://en.wikipedia.org/wiki/Coefficient_of_determination
belongs to
AI Model Assessment Metric c

Rand Indexni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Rand_Index

The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings.
Source
https://en.wikipedia.org/wiki/Rand_index
belongs to
AI Model Assessment Metric c

Recallni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Recall

Recall (sensitivity) is the fraction of relevant instances that were retrieved.
Source
https://en.wikipedia.org/wiki/Precision_and_recall
belongs to
AI Model Assessment Metric c

Recommender persistenceni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Recommender_persistence

belongs to
AI Model Assessment Metric c

Robustnessni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Robustness

belongs to
AI Model Assessment Dimension c

Root Mean Squared Errorni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Root_Mean_Squared_Error

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed.
Source
https://en.wikipedia.org/wiki/Root-mean-square_deviation
belongs to
AI Model Assessment Metric c

ROUGEni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#ROUGE

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing.
Source
https://en.wikipedia.org/wiki/ROUGE_(metric)
belongs to
AI Model Assessment Metric c

Serendipityni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Serendipity

belongs to
AI Model Assessment Metric c

Silhouette Scoreni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Silhouette_Score

Silhouette refers to a method of interpretation and validation of consistency within clusters of data. The technique provides a succinct graphical representation of how well each object has been classified.
Source
https://en.wikipedia.org/wiki/Silhouette_(clustering)
belongs to
AI Model Assessment Metric c

Speedni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Speed

belongs to
AI Model Assessment Dimension c

Stabilityni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Stability

belongs to
AI Model Assessment Dimension c

Training Speedni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Training_Speed

belongs to
AI Model Assessment Metric c

True Negative Rateni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#True_Negative_Rate

Specificity (true negative rate) refers to the probability of a negative test, conditioned on truly being negative.
Source
https://en.wikipedia.org/wiki/Sensitivity_and_specificity
belongs to
AI Model Assessment Metric c

WERni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#WER

Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system.
Source
https://en.wikipedia.org/wiki/Word_error_rate
belongs to
AI Model Assessment Metric c

Youden's J statisticni back to ToC or Named Individual ToC

IRI: http://www.w3id.org/iSeeOnto/aimodelevaluation#Youden's_J_statistic

Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomous diagnostic test.
Source
https://en.wikipedia.org/wiki/Youden%27s_J_statistic
belongs to
AI Model Assessment Metric c

Legend back to ToC

c: Classes
op: Object Properties
dp: Data Properties
ni: Named Individuals

References back to ToC

Add your references here. It is recommended to have them as a list.

Acknowledgments back to ToC

The authors would like to thank Silvio Peroni for developing LODE, a Live OWL Documentation Environment, which is used for representing the Cross Referencing Section of this document and Daniel Garijo for developing Widoco, the program used to create the template used in this documentation.