Origins

I am sure that the ideas of Very Large Scale Social Data Collection and Recursive Exhaustion as I use them here have both been discovered before, but I feel I deserve some credit for working them out on my own.   Let me distinguish between them.  Very large scale social data collection is essentially the idea of extracting large amounts of information about each and every person in the world.  Recursive exhaustion is an algorithm for doing so.

In principle recursive exhaustion could be used on an artificial “small world” which is algebraically closed.  But the basic idea is to use it to collect information about everybody.

As I have written on my Books website,  attempting to express my ideas in fiction were not successful in the usual sense of producing a good piece of fiction.  Quite the opposite.   But doing so was tremendously productive in bringing forth new ideas and polishing the existing ones.

As fiction, my books have been terrible.  One major fault they all had was the lack of any significant antagonist.  What I wrote was hopelessly optimistic, describing an ever improving future.  I recently tried to write a new book without that flaw.  To do so I imagined an evil organization which used a massive amount of information about individuals to find ones susceptible to blackmail, bribery and intimidation.  I gave them the ability to find out about people’s lives in extreme detail.

Always given to an engineering approach to problems, I asked myself how this could actually be done.  If tasked with the problem myself, how would I approach it?

To make this a harder problem rather than an easier one, I assumed that the evil people did not depend on hackers to steal information.  Yes, I included that as an extra source of information, but didn’t want that a requirement.

As a natural part of the narrative I wanted a protagonist or organization of them who would use the same basic method but start with only public domain information.  Could the good people ever compete with the bad ones?

Again, I approached this from an engineering perspective.  If I was one of the good guys, committed to using only public domain information, would there be any way of competing against people who had no such scruples?

Yes!  Let us suppose that all of the original data was expressed numerically, as vectors. There is a lot of public domain information which could be used to multiply the original data, however little that was.  In particular, public records list the parents of a person.  Suppose the same basic data was available on each of those.  Theirs could be added to yours.   Because of this addition of second order data, the number of fields in the resulting record would be three times as many as originally.  Even without considering genealogical matters, there are a lot of ways second order data could be added.

Then there could be third and fourth order information.  Wouldn’t the higher order information be of less and less value?  No, but yes.  Third order information obtained by genealogical methods of multiplying data is a good example.  The facts known about your grandfather are less significant than those known about your father.  But there is more of this third order information because you (in almost all cases) have four grandparents.

The best way to look at this is through analogy with a radio telescope array.  Each receiver provides little more information than available than the ones next to it.  But the vast number of receivers  allows their combined signals to contain a lot more information than any smaller subset of them.

Though information about friends and acquaintances is not public domain, let’s pretend that it is.  Whatever is known about you could be extended by the addition of second order information from your friends and acquaintances.  It is common to say that no person on Earth is more than six links away from you in the social network.  If so, then sixth order information should include that from every person in the world.  Clearly information from the more social distant people is of less value, but there are many more of them.  Each need contribute less.

I will continue to explain the details of this idea, in various pages and posts on this site; all I wish to make clear now are the circumstances and thought process which lead me to my own discovery of these ideas.   Anyone else who has discovered them independently is invited to tell their own story, which I will be glad to publish here.