Algorithm Explained

For general information and warnings see older front page.

There is an algorithm for collecting data on individuals by exploiting information about their social connections.

It could be applied to use data from friends, family, employers and important places in you life, but to start with, let’s consider just your ancestors.

Simple example

In the diagram above, used as a header throughout this site, the black boxes represent initial data.

As an example, assume that the initial data is just two numbers, the latitude and longitude of a person’s birthplace.

Giving your place of birth says little about you. But you may import your parents’ birthplaces as additional facts about you. Then instead of two numbers in your description there are six.

Do the same for your parents. Each of them will then have six field description.Now import the non-duplicate fields from their descriptions. That would be four each, bringing yours to fourteen.

Keep going as far back as you can. Depending on how well you know your ancestry, the result will be a long series of numbers which tells a story about you. In my case it tells of people from Ireland, England and France making their way across the Atlantic and traveling to Canada’s pleasant Pacific Southwest.

So from only two numbers per person, I end up with many, telling a story, forming a good description of me. Of all the people in the world, only my brother shares the same description.

This is a good example because all of the data back a few generations is a matter of public record. It may be difficult to obtain, but it’s there.

Redundancy

The above example has little redundancy in it. Each pair of numbers contributes an important piece of information.

But more data could be added. Birthdates, level of education, occupation, height and weight. For sake of argument, assume there is even more. Then when you add your parents’ data to your own, there is some redundancy. The attraction of your parents’ to one another is somewhat predictable. You are the kind of person you might expect your mother to give birth to, and the kind of person you might expect your father to sire.

So the actual amount of information acquired by adding your parent’s data to yours is less. Instead of ending up with three times as much as you started with, there may be only twice as much.

Information from public records

A lot of other information is available from public records, including marriages, property ownership, law court proceedings and voting rolls.

Regardless of their contents, these records serve to link people. You can import into your own description those of people you are linked to. They can import into their descriptions those of others they are linked to.

So even just using public records your description could be expanded again and again.

Thought of as an iterative process, then the number of fields in your description would increase dramatically at each step.

There would be some redundancy in this process, but the amount of information in your record would at the very least double at each iteration.

This is an exponential increase, in the literal, mathematical sense of the term, not as a synonym for “very fast” as in popular usage.

It’s hard to say how quickly one could go through the iterations, but even starting with a small amount of information and a small number of social relationships, less than 33 iterations would cover the entire human population.

For individuals in the developed world, that would be a staggering amount of information, all from public records alone.

Relatively harmless information not in the public record

Information about your personal social network is not in the public record, which does not say who your friends are. If that information is added, it is then it is much easier to expand the size of the data record for you and it would be more accurate.

Detailed information about your education, including courses taken and grades achieved is not a matter of public record and is probably something you’d rather keep secret, but it would lead to a more accurate picture of you.

Illegally obtained information about you

I have written about malicious people creating long and detailed records for everyone. Census data with identifying information is available within the government, but not legally accessible. Income tax records are secret, but could be stolen.

There are other facts known to government agencies which are legally protected. Credit card information include what purchases you made is available to those who can hack in and steal it.

But what is this data good for?

I should correct that to “What is this useful for?”

Good, or Bad?

Let’s start with an example involving law enforcement.

Train a machine learning device by providing with a wide variety of people known to have committed murder, plus a wide variety of people known to be innocent of that crime.

Then supply the very long records of a series of suspects to the device. It should be able to identify those likely to have committed the murder.

This seems useful and probably a good thing.

Now imagine malicious people training a machine learning device on people who have outstanding character and those who have been revealed to have skeletons in their closets — sexual predators, embezzlers, adulterers, crooked politicians, undesirable people of all kinds.

Then used the trained device on people who seem innocent — haven’t been caught yet. It should be able to reveal those most susceptible to blackmail and what their sins were.

Similarly, train a device on a set of honest politicians and a set of ones known to be corrupt. Suppose that the trained device has been created by a foreign government. It could be used to bring down any democratic state.

If enough data is available though the use of the algorithm described above, all of those things are not just possible, they are easy.

Briefly, this algorithm and the data it can collect could either make society work, or destroy it.