Terminology

By data I mean structured facts — the presentation of facts in a form useful in data engineering.

A fact may be an attribute of an entity or a relationship between entities.

A dataset is a collection of data.

When you download a dataset, you will usually be offered it in a choice of several database formats.

When I use the term database, I almost always mean dataset, since unstructured facts can also be stored in a database.

The newer term ‘factbase’ suggests a collection of facts.  When I use factbase I almost always mean a fact set.

Someone seems to have trademarked the name Faceset to use at the name for their business.  I will use the lowercase term factset, by analogy with dataset.

One converts or partially converts a factset into a dataset by extracting the data from set of facts.

Recursive Exhaustion is the name I use for an algorithm which seems to have been floating around out there.  Basically it converts a factset into a dataset by exhaustively extracting all of the data from it.

Regardless of database format, a dataset can be fixed or fluid.  The most interesting ones are fluid — constantly being updated.

The term recursive is not meant to imply any implementation, just that the use of it in working with a fluid dataset is a recursive process.   To retrieve the most current data record from its database, you need to retrieve all related ones.  They may also have changed, so the ones on which they depend need to be examined.

For the purpose of this and all of my other websites, I seek only numerical datasets.

A numerical dataset is a matrix of numbers, plus labels for the rows and columns.

Ignoring column ordering and labels some datasets will be the kind of algebraic structure that mathematicians call categories, where the rows represent the entities (objects, nodes) and the contents of the matrix encode the morphisms (arrows) between them.  Ideally the underlying structure is algebraically closed, in which case it forms a group.

In general, a dataset will not have identities, and so will not be categories.  Instead they will be semicategories, also known as semigroupoids.  Ideally the underlying structure is algebraically closed, in which case it forms a semigroup.