Search Engine Optimisation

There are today search engine and internet marketingthe cosine angles for all terms against the query
services, in fact a new industry has materialised tovector, this is expressed as a vector corresponding to
exploit the fear of low search rankings.This is not athe sought column and therefore the document we
new trend, back when simply resubmitting yourare after, all we need do is present this to the user,
website to the engines resulted in keeping your site atright, well....The issue is that a search engine needs to
the top of the index, there was an accompanyinggenerate a linear index, ie convert the vectors
boom in resubmitting "companies", as we know, thesecorresponding to the minimum cosine angles into a
were just men in back bedrooms with a host of CGIhuman readable format, and until such time as
and Perl submitting scripts and a timetable.Searchsomeone thinks of a better way to do it, all engines
Engine optimisation or "SEO", is the latest incarnation ofoutput lists, like your shopping list, it has a start, a middle
this bedroom profiteering, the important difference isand an end, therein lies the problem, how to order the
that now the webmaster's are not just passivelylist!The hypothesis seems simple, ordering information
involved but are being forced to adopt totally artificialthat might look chaotic at first, using the fact that
and unsocial practices that ultimately serve only to helpclosely associated documents tend to be relevant to
damage the Internet!SEO is supposedly thesimilar requests. However, the internet (being a scale
methodology and processes related to designingfree network) is so vast that it is not possible to
search engine "friendly" web content, the basicpresent a chosen feature space that represents the x
premise is something like "If I follow all the enginesclosest documents to the convergence point in a given
formatting and connectivity criteria, then my websitecluster from the common Euclidean distance. This is
will rank higher then a comparable website that doeswhat should then be presented to the user in a more
not".All other things being equal, this seems quiteintelligible (semantic) display.The engines could just
positive given that the quality of a search enginespresent the returns as produced by the matching
database (index) directly effects its output; thenalgorithms after decomposition, because the grouping
webmaster's optimising their content so that searchgenerated using probabilistic/fuzzy patterns directly
engines can correctly categorise the internet shouldfrom the cluster might belong to more than one class,
logically improve the speed and quality of "thebut the strength (degree of membership) value
crawl".SEO then, logically, should be good for themeasured on a scale; using probability on a [0,1] interval,
search providers, being able to maintain an efficientis quite adequate.The reason decomposition in singular
index, this should use less raw processing power,values works for ordering is related to the fact that
require less equipment and thus less energy; this mustthe occurrence of two terms (say tomato and potato)
also be good for the users, being able to quickly andis very high is reflected in the term-by-document
intuitively find what they want from a reliable source.matrix by showing that only x of the n terms are used
Sounds reasonable right?Well that's the happy version.very frequently.The idea is that since the term say
The fact is that initially this may be true, you may gainpepper is used/mentioned very little, then its axis
a short term advantage, but once we have alldimension does not affect much the search space,
optimised our content for analysis and (in so doing)making it flat and relevant only in the other two
ignored our users; We will then be back to where wedimensionsHowever the engine's demonic creators
started, and the search providers will just think upcan't do this because they are still essentially using an
some even more ridiculous "laws" by which to "judge"inverted file structure, but they still want absolute
us by, and like sheep we will all do that as well, thuscorrectness in their indexes and returned results which
the causal paradox is perpetuated and the users feelmeans trouble, because this assumes your index is
abused!Even this is a vast oversimplification, the trueperfect, incapable of being manipulated and that you
nature of SEO is a lot more complicated; The heart ofcan somehow order the returns in a meaningful
the problem and the real issue here is related to theway!So the returned results can't generally represent
search providers task, which is to strip mine thethe documents that match semantically, we now need
information junk yard otherwise known as the Internet,to account for some subjective quantities, that can not
it may be full of interesting stuff but also plenty ofbe derived directly from the corpora, they attempt to
garbage and they need to devise intelligent techniquesdeal with this by a cocktail of criteria that rank the
to mine the interesting stuff!The current "solution" isreturns in such a way as its more likely that the
literally for the search engines to use their hegemonic"better" results are closer to the top of the list.There
standing to bully the webmaster's into organising theirare many ways of doing this, the current trend is to
work in ways that have the primary effect of allowinguse inference about the quality of web sites were
quick "analysis" so they can categorise the website,possible because such quantities are beyond the direct
but this has the secondary effect of requiring contentcontrol of the content creators and the
to be designed "for" analysis, which typically translateswebmaster's.PageRank provides a more sophisticated
to highly distributed connectivity, ie the website beingway of citation counting but this is embodied in the
effectively divided into "micro sites", which makes theconsept of link analysis, using a relative value of
maintenance of links and content moreimportance for a page measured based on the
troublesome!This is not necessarily a bad thing, most ofaverage number of citations per referance
these imposed linking and design methodologies areitem.PageRank is currently one of the main ways to
often positive and beneficial for a lot of subjects. Mydetermine who gets into the top of the listings, but
problem is that this is unilaterally enforced and it is thissoon this will all become irrelevant when the engines
type of issue that is generating all the money for thestop using inverted file structures, because they can
SEO boys.However this will soon be of nojust use the grouping generated using probabilistic
consequence. To understand the problem with thisfuzzy patterns resulting from the convergence point in
type of SEO operation, it is necessary to think abouta given cluster from the common Euclidean
how we can approximate and simulate the humandistance.When the changeover from inverted file
process of mining information and knowledge.Let usstructures occurs, there will be two direct
assume we have set our Crawlers to work,consequences:The corpora will be capable of vastly
automatically indexing pages (at random, looking atmore representative and more detailed data then is
previous indexing and guided by user requests); weCurrently possible.The corpora will no longer be
then format the resulting text: ASCII is usually used andindexed as is currently done, they will embody
validation follows this, search engines tend to ignoresemantic meaning and value, where some subjective
some tags and make use of good ones that helpquantities can be derived directly from the corpora
identify the content. At this point we would havewithout the need for cocktails or totally artificial rules.
reduced the Internet to a corporation, ie the collectionThe effect is that corpora will be more accurate and
of all HTML documents about no particular subject.Weincapable of manipulation, thus variations of SEO that
then would set about item normalisation, ie identificationinvolve indirect manipulation of the index will become
of tokens (words), characterisation of tokens (taggingpointless overnight.It is worth noting that the search
meaning to words), and finally running stemmingproviders are becoming increasingly pessimistic about
algorithms to remove suffixes (and/or prefixes) towebsite promotion in all forms, they currently penalise
derive the final database of terms; this can bemany things that can effect the results such as
efficiently and compactly represented in lower termduplicated content (which can be perfectly legitimate),
dimensional spaces, (Goggle are still essentially usingand satellite sites, ie one webmaster interlinking
inverted file structures).Imagine each document of aseemingly separate but highly relevant website's.They
corpus as a point ie a term in an N dimensional space,may well start penalising webmaster's that promote
here the literal word matching type search is lost, buttheir website's through articles they submit for third
we acquire more of a semantic flavour, where closelyparty distribution, as they do for people that post their
related information can be grouped in to clusters ofsites information to bulletin boards!Being banned from
documents bearing similarities, however N dimensionalthe top search engines can effectively destroy your
vector spaces are of no help to the users.Afterbusiness, if not directly through loss of visibility then
applying our algorithms to the corpora, we get a termindirectly in that people tend to judge you on weather
by document matrix, where terms and documents areyour are organised enough to be listed !The criteria are
represented by vectors, a query can also becontinually changing, as the amoral SOE boys attempt
represented by a vector. So we have a query andto pervert the resultes, these "laws" are not always
our corpora (represented as vectors, bo!th having theclear and there are no appeals, where we are all
same dimensions), we can now start matching thesubject to the providers up ending a drum then
query against all the available documents using thedispensing swift and hard "judgements", that can doom
cosine angle between these two vectors.But we nowus at any time!The part that erks the most is that as
have a new artificial "problem"; we know the generalthe indexes converge, (goggle's index is used directly
answer to the question "which website's best matchby 2 of the 3 top engines and 5 others indirectly use it
my search terms", this information now exists in ourfor their rankings) a bann by anyone of these engines
mathematical object, at a high level of abstraction, ieis enforced by them all.