Against the background of the search engines development, people who always broke rules and tended to see their site appearing on the first page by any means also developed more quickly. If three - four years ago any methods of spam search have been above Google's bend, many of them are relegated to obscurity now, giving the place to more refine and expensive ways in realization. The identification of spam in Google is completely automated today, moderators do not interfere with work of algorithms and don't delete a site by hand; it is done algorithmically.
With the lapse of time web designers made distinct conclusions about algorithms of this search system. Now we can say in some cases what is bad and what is good for Google. Naturally, these are optimizers’ guesses, but certainly something in this can be the truth.
What is the sandbox or Google Sandbox.
Perhaps, it is the hottest moment which web designers from every corner of Google since 2004 were interested in, when this phenomenon appeared for the first time, and this interest exists till now. Because of various improvements of algorithms, there is such a situation that a new site cannot get on the first search page in Google by competitive requests. There is an opinion that the search system just does not let new sites appear by competitive requests within 6 months. That’s not quite true and let's analyze why.
The first thing that is necessary to pay attention on is the representative publication of this search system. They declare that they did not develop anything like this on purpose, and all this is the consequence of various factors. They cannot be the same for all the sites, and they are calculated depending on the situation.
The basic idea of this phenomenon (Sandbox) consists in the thing that the site which has appeared in the network recently cannot be authoritative and its voice does not need to be accounted. First people should find out everything about it, analyze, and show their opinion. Just physically it requires time. The site which suddenly has received many links from nowhere appears also unnaturally, and its competitors do not. Depending on situation, Google counts threshold value for each site, this phenomenon depends also on the search request. While the site does not meet all requirements, it either cannot be found among the first 1000 results, or it is very far from the first page if the whole amount of the documents relevant to the request is not big.
Many web designers are indignant: "How is that?! I have such an interesting site!" Almost everything boils down to some sanctions and interdictions, but nobody is going to look at the technical side of the question at all. In any case this concept (Sandbox) looks like a blackbox for the majority. We have some data on the output and on the input. But it is impossible to connect them in a whole, to find out interaction of the factors precisely. Each web designer or the expert in sites promotion tries to treat this blackbox in his own way. That's why there is a plenty of fantastic stories and recipes to overcome this phenomenon. It looks really ridiculously, considering huge intellectual and technical opportunities of experts from Google. It's just silly to hope for any rectilinear recipes which will work for any case.
Let's at least try to assume what can include these grandiose formulas of Google, what factors and in what degree influence on some things.
Let's begin with the temporary factors. We know, relying on historical data about Google company development that they have already registered some patents for a subject of sites ranging. Such dates can be used for the definition of relevance in algorithms now: date of domain registration, date of the first site indexation, occurrence date of external links and their dynamics, dates of all text changes on a site and others.
Having read through all these patents carefully we can assume, that:
- The earlier the site is registered, the better;
- The earlier it is indexed for the first time, the better;
- The longer stand external links to a site, the more weight they give;
- It is good, if external links appear naturally but not very quickly;
- If the site is often updated, it is good;
- If the site has sharply changed the subject in some time, it is bad;
- With the lapse of time the influence of all the factors both internal and external gets stronger:
Because search systems range the sites’ pages, it is possible to assume that there are some factors that depend on the site. It can be all those temporary factors, that we have mentioned, and also internal contents of the site. It is possible to assume, that Google conducts statistics on all the sites in the network and knows, how an average good site looks like, how and in what sequence it should develop. Then it is possible to say that it is better not to be zealous with the site content optimization at the initial stage of development, not to pursue density of keywords and other.
External factors certainly influence on the work of this algorithm with reference to any concrete site. What can they be? Of course we talk about the links. All the external links, the text which they refer to, this entire can influence calculation of that threshold in the Sandbox. It is considered, how the amount of links, the dynamics of their appearance, and what sites these links are from, which keywords they have. Occurrence of many external links for a short time interval is bad. Links from bad sites are not plus. Links with the identical text, participation in systems of the automatic exchange and rings are bad. Natural links and links from authoritative sites are good.
What else can participate in the algorithms of this Sandbox? Absolutely obviously it is a request which you enter in the search system. It will more depend on request, whether your site will participate in the basic group of sites, or will get under the influence of restrictive Sandbox algorithm.
Let's find out, what properties possesses the search request in general. Each request has frequency of its search in the system. There are requests with the wide popularity, their characteristic is more. And there are also requests with small frequency of typesetting. During the fetch in the index according to some word-combinations appears a certain amount of all relevant documents. There is an amount of all links in the network with these keys, and Google knows all their numerical characteristics.
If you know the request, you can find out, whether the advertisers buy the announcement in Google Adwords or not, what common competition is there. Having such statistics in frequency, competition, the sums which are spent, Google can group the sites by the principle: if the site is commercial or not. And depending on this it adjusts the factors that participate in calculations of Sandbox algorithm automatically.
Google representatives informed once that popular requests are given more attention than unpopular concerning quality assurance. It is natural that all this is done automatically; that is the analysis of popular subjects takes more processor time of all the servers. By very popular, commercially attractive requests, we almost won't notice bad sites or doorways whereas we can see almost 90 % of doorways on in Adwords forbidden subjects or just seldom typed requests on the first page. All this is because more difficult algorithms start working only on the certain frequencies, and there is no sense to spend additional resources for recalculation of all the parameters. When it is unjustified, some algorithms do not work. In fact, what does determine the quality of search? Yes it is simple, whether the user is satisfied with the things offered him on his demand, or not. If he searched for something, found and stayed happy, then the search is qualitative.
The user also participates in all this system. What country he submit the request from, what regional Google he uses, can affect the results of ranging. But in particular, this influence will depend on the same frequency characteristics of search request, but in the context of the existing regional features.
Considering everything mentioned before, it is possible to offer such generalized scheme of the Sandbox.
Generalized scheme of Sandbox algorithm work

Picture 1 Generalized scheme of the Sandbox
TDOMAIN – all time characteristics of a domain
XREQUEST – all frequency characteristics of the search request
XSITE – all internal content characteristics of a site
XEXTERNAL – all external characteristics of a site (links and Page Rank)
XUSER – all characteristics of the user making a request K – different coefficients and influence of other unrecorded powers
Well, now, when the most possible factors are considered, we can draw some conclusions to try on this scheme what we collide with in the reality and to check if our assumptions work.
Temporary characteristics of the domain definitely only reduce threshold value. But there is an interaction between the date of registration of the domain, the date of the first indexation, the dates of return links occurrence. If you just register the domain and put it aside for a while, not developing the site and not putting external links it will influence threshold value minimally. More important component is the date of the first indexation and the date of occurrence, dynamics of the external links growth.
The more popularly the request is, the more are the coefficients for calculation of the threshold value. The commercial side of search request and subjects are also considered.
Internal content characteristics of the site influence the process of the threshold calculation minimally, but excessive amount of the keywords (over-optimization of the site) conducts to its increase.
External characteristics of the site can both reduce the time of staying under this filter, and increase it. We can observe some "golden mean". If you do not leave its limits in both sides, it will not at least increase the threshold. What are the coefficients of this interaction, how many links have to be put, how fast, from what sites and with what Page Rank - we can't say precisely. These parameters are calculated dynamically for each concrete request and site. For example, it is not necessary to put on a new site of the link from the same new sites, and also it is not necessary to put at once links with Page Rank=8.
Depending on what user submits request, in what regional Google, the results of delivery can differ. How can you explain, for example, that for Russian requests this filter is either not observed at all, or only on the most competitive subjects? Yes, it is very simple. The popularity of these requests in Google is not so great. People do not often use this search engine, the frequency characteristics of its requests are less than of English-speaking, so the algorithms are a little softer. This all happens because of the competition of requests and the amount of the sites participating in the fetch. The search system operates with words, phrases in different languages. Every language has its own features, but it almost doesn't influence the technical tasks, like the search in the database and the calculation of different mathematical formulas.
Google -30, -950 filters and others.
There is a rumor among the optimizators that there are certain filters which lower your site for any concrete amount of positions in appearing for the revealed infringements. Simply speaking, it would be silly from Google to act like this. The representatives of this search engine and of the others too, declared that nobody can manipulate with the positions of sites, it is just impossible to put a site on this or that place by any search requests.
And if you take in account that the users from the different countries, and also within one country, can be shown various results, it is just impossible. Yes, there are automatic filters-penalts which are imposed on a site for fine infringements. With the lapse of time they are also automatically removed, if the errors are eliminated.
But, the punishment cannot be strict for 30 positions, for example. You can punish a site, having reduced its numerical parameters, for example, real Page Rank or force of external reference ranging. As a result one site will get down for 30 positions and another for 40.
Recently such topics appeared on one of the well known forums for optimizators that there is a new Google-20 filter. In some days it got clear that is there was an error in the program which determined the positions of sites. This program is popular enough and it is also often used by many web designers. After the developers have eliminated an error new Google filter has disappeared. So it is necessary to take easy any sort of statements about the filters with the exact indication of numbers.
Google Supplemental Results filter.
We can mark out two indexes for documents in this search engine. One is basic, and the other is additional. When Google doesn’t find relevant documents from the basic index by any request, it will add in their appearing from additional index. Earlier duplicates of pages, doorways, pages with many links, pages with bad, in Google opinion, content got into additional index.
Now the situation has changed a little. The representatives of the search engine have told which documents are in the additional index. The documents, which have insufficient Page Rank to be in the basic index, and also that have the bad contents: duplicates, empty pages and others get in the Supplemental Results. Because the real Page Rank is recalculated constantly there can be situations when the document with a unique content gets into additional index, too. It means that you need to wait a little, you can add external links to this document and everything will be ok. After a while your page will be in the basic index.
On the other hand it is also possible to assume, that if the page has got in an additional index, then its real Page Rank got smaller, not the visible rank in GoogleBar. So, by competitive request your document will not appear on the first page, and it will be seen by less competitive one.
The filter for content duplication.
This filter shows up in such a way: Google chooses from the whole amount of duplicates the main one, and all the others can either receive penalty, or get into additional index. So, by any request above there should be a main site, all the others should be below. But, the principle of determination of more relevant site is not as simple as it seems to be.
For example, the source of any news can be much lower by request, than a site which has just reprinted this news. Google takes in account the authoritativeness of the site, and only after this its content. Therefore, the more authoritatively is the site, the more chances it has to be above in case of duplication.
The filter for identical texts of links.
To make the search better, Google tries to fight against artificial increase in link popularity of a site. In particular, the links leading on a site cannot have the identical text. The link itself is a voice of a web designer, in fact. He decides what exactly he liked in your site and puts the link himself.
Concurrences in 100 identical links from different sites for Google will not take place. You can see at once, that there is an artificial increase in popularity. The similar filter can not consider all these links, or will consider some of them, but not the others. So, optimizators have to get accustomed to the thing, that all the links are unique.
There are also many other filters made up by seo companies and optimizators who provide search engine optimization services. But all of them are only guesses and until nobody of the representatives of the search engine has confirmed them, there is no sense to dream for nothing. Google is a powerful system today that unites enormous amount of different interconnections and filters; so we just need to have patience and observe this.