Previous by DateNext by Date Date Index & Search
Previous by ThreadNext by Thread Thread Index & Search
LM_NET Archive



Hello All,

I recently posed a question to the group about why Google was showing lower
numbers of hits lately than they have in the past.  A lot of people
expressed interest in any answers I could find.  I was referred to Greg
Notess, librarian and professor at Montana State who writes and speaks about
the Internet and runs the Search Engine Showdown website.  I thought Google
used the NEAR operator and was disappointed to learn that they do not.

This was my question:
For the past few years, I have used a lesson I wrote to emphasize the
importance of knowing how to search the internet effectively.  I have
performed the same Boolean search on Google each year with ever expanding
results (see below).  It definitely brings home the importance to students
(and teachers) when they see how many more websites are added each year.
This year when I ran my search, I received far fewer hits than in the
past--whales and fish was at slightly over 2 million, whales near fish at
1,930,000, etc.  I know it fluctuates but this seems like a drastic drop.
Several other media specialists emailed me saying they had the same
results.

I did some online searching trying to find an answer and found others
blogging about this--complaining that they felt Google was placing some
sites higher for the sake of ad revenues.  I also found a November 2006
story where Marissa Mayer, a VP at Google, said that more results slowed
down response time and "Half a second delay caused a 20% drop in traffic.
Half a second delay killed user satisfaction."  But there seems to be
confusion over exactly what she meant.

I have been wondering if they are now limiting results to keep response time
quick and if this might cause some selective placement--Google is after all
a  corporation with profit as the end goal.  Several people emailed me that
you were the person to ask about this (since I haven't figured out a way to
contact anyone at Google).

As an educator, I am hoping this is not a portent that Google will
eventually become a pay-per-use search engine.  As often as I push students
toward databases that the state provides or that we pay for, they all love
Google and they are not going to stop using it.



His response is below.

Good question, although I'm not sure if you'll like my answers. Feel free to
share as you wish with LM_NET.

First of all, the numbers that Google gives, especially the large ones, are
just very inaccurate estimates. Note that Google tries to make that clear by
saying "about 1,930,000" and that the large numbers always end with lots of
zeros. I tend to interpret a number like "about 1,930,000" as truly being
something between about 500,000 and 5 million, but we do not really know
because we do not get an accurate count from Google. The numbers can change
anytime Google changes an algorithm EVEN THOUGH the actual number of results
in Google's database may not change. Consequently, a year to year (or even
day to day) comparison when numbers are that high likely does not truly show
that Google's database is growing or shrinking but only that their
estimation algorithm might be changing. While the numbers for the past
several years rose, that may or may not actually represent growth of the Web
(or at least that portion indexed by Google).

To get a more accurate comparison over time, use search words that find far
few results. For example, at the moment , with Google set to display 100
results at a time I get the following number.
    tarsiers warblers   "about 617"
Using some more unique phrases such as
    "colquitt county high school"
also get smaller numbers that may be more accurate.

Another issue I should probably note is that the search syntax you appear to
be using is probably doing something slightly different than you expect. For

    whales, fish
the comma is ignored and should make no difference in the search if you
leave it out.

Unfortunately, Google does not have a NEAR operator (AltaVista used to but
now only Exalead supports it). So the search of
    whales NEAR fish
actually searches for all three of those words.

The OR operator is supported, so the
    whales OR fish [or for that matter tarsiers OR warblers]
should work correctly, but the numbers push so high as to run into the
Google estimation problem. At least it does point out that a searcher will
find many more results with the OR operator than when no operator is
included and the search is processed as an AND.

The NOT operator is not supported as such in Google. Again, the search
    whales NOT fish
searches for all three words. To get a NOT operation, Google does support
the minus (-) operator. So, for example, with my example
    tarsiers -warblers
finds "about 94,500" which is less than the "about 93,800" that just
tarsiers reports.

Sorry to go on at such length, but I hope you find the information helpful.
Let me know if you have any follow-up questions.


Greg R. Notess   greg@notess.com


notess.com
1-406-585-2287 ; 1-253-390-7391 (fax)
  SearchEngineShowdown.com ; Internet columnist for ONLINE
  Author of Teaching Web Search Skills notess.com/teaching



And I am adding (again) Laura Pearle's response, which she posted the other
day:

Here's one response I got from someone who knows a little bit about
these things:

I do have some thoughts on the subject.  My guess is that it's a
combination of all of the items below:

1. Google doesn't publish their algorithms, but many search engines
prioritize results of their paying customers.  That, and ads, is why
the service is "free".  I'm not aware of a search engine explicitly
filtering competitors of their customers, but pushing a result lower
makes it less likely to be viewed (see item 4).  On the other hand, a
number of folks have noted that google rarely seems to find
anti-google sites in their searches (yahoo does ;-), so they appear to
be filtering at least somewhat based on business concerns.  I rate
this plausible.

2. It is known that the algorithm does count numbers of links to a
site in ranking, so that may be part of it.  Sites that aren't linked
to as heavily fall down lower in the results.  We may be seeing a
concentration of sites with a high number of links.  That would push
some of the older hits down lower in the results (again, see number
4).

3. Sites now have the ability to tell search engines that they are not
to be indexed.  This didn't use to be the case.  This is becoming more
common, particularly for sites like news and subscription services.
Telling search engines to not index you means that you now get skipped.
This results in fewer hits.  I rate this as likely part of the answer.

4. Google may well be clipping the number of responses.  Few people
look at the results beyond a few pages.  If google says it found
10,000 results, I don't generally look at them all (really, more than
100 pages is too much.  I look at the first 5 or so).  Clipping makes
sense because google is fanatic about only presenting information that
is meaningful.  They have been known to drop advertisers that pay them
money simply because not enough people clicked on their ads...and
google wants the ads to be thought of as useful by the end user...not
just clutter.  I rate this as likely part of the answer.

As for the privacy bit, you are trading your privacy in exchange for a
free service (part of the adverting based model).  The government
already has the ability and right to intercept web traffic (NSA does,
FBI has systems like carnivore, etc).  Much of this is now governed by
the enhanced CALEA act (wiretapping laws).  You should assume that
your ISP, the government, and google all have access to anything you
type.
Anyone who wants to limit this should use one of the internet
anonymizers and make sure they aren't logged into google for things
like gmail.  I use megaproxy, but there are free ones out there like
tor.

Hope this helps...
__________


So, to summarize, it is probably the result of changing algorithms to keep
time down and revenues up.  And I KNOW we don't need or want a billion
websites on a topic.  It does seem to me that this is perhaps a step in the
direction of Google becoming a pay per use site, or less reliable if it is a
pay for placement site.  We all want our students and teachers to use
databases and evaluate sites thoroughly, but we know that they do not and
Google is now a verb.....

I really only began investigating because it blew my lesson up--which was
apparently not based on reliable numbers in the first place though it was
effective in teaching the importance of learning search skills.  Several
other people who used my lesson contacted me because they had the same issue
with numbers.  Have to admit that what I learned was interesting.  Perhaps
we have reached the point where the Internet is 'too big to measure' (so
much for that TV commercial where the guy gets to end of the Internet!).
-- 

Cheryl Youse, MLS
Media Specialist
Colquitt County High School
cyouse@gmail.com

--------------------------------------------------------------------
Please note: All LM_NET postings are protected by copyright law.
  You can prevent most e-mail filters from deleting LM_NET postings
  by adding LM_NET@LISTSERV.SYR.EDU to your e-mail address book.
To change your LM_NET status, e-mail to: listserv@listserv.syr.edu
In the message write EITHER: 1) SIGNOFF LM_NET  2) SET LM_NET NOMAIL
 3) SET LM_NET MAIL  4) SET LM_NET DIGEST  * Allow for confirmation.
 * LM_NET Help & Information: http://www.eduref.org/lm_net/
 * LM_NET Archive: http://www.eduref.org/lm_net/archive/
 * EL-Announce with LM_NET Select: http://lm-net.info/
 * LM_NET Supporters: http://www.eduref.org/lm_net/ven.html
 * LM_NET Wiki: http://lmnet.wikispaces.com/
--------------------------------------------------------------------

LM_NET Mailing List Home