By Ken Adachi <Editor@educate-yourself.org>
http://educate-yourself.org/cn/googleandbigbrother07sep05.shtml
September 7, 2005
Thanks to this e-mail sent by George Z., I am now aware that
using www.google.com for searches is probably not the wisest thing to do
since they apparently extract information about you and TRACK
the info you are searching about-undoubtedly to wind up in Big Brother
data storage computers. After you read George's intro letter, you can read
the articles from scroogle.org which follow. They explain the the technicalities
involved in stripping tracking information from google searches. I'll be
changing the Search page at my web site soon and replace the link to google
with a link to another, non-tracking search engine.Here's a few to try:
Your site, (http://educate-yourself.org) can be considered
a “public” site and the use of the built-in “Google Search”
(from the search results page) could be considered in a different light.
I will, however, never (knowingly) use this tool – not even from your
site.
I still think that this use is potentially entrapping to the
uninformed - as it implies that it is "OK" to install the Google
toolbar on my personal machine or that the use of Google is "safe."
Granted, they now make disclosures (for the things that they
previously got caught at) in the agreement You “sign” when installing
the tool – but do You read these? – You should read very carefully
(and understand the implications).
Thanks again Ken, for all you do
Best regards,
George Z.
*****
[Note to RBN (http://www.rbnlive.com/)]
Subject: The promotion of (personal data spider) Google on
the various programs
Comments: Greetings to all of You! Permit me to begin by thanking
all of You for the great fight! – The war is not yet won!
As background to the subject, I recommend that you visit http://scroogle.org/
and http://www.google-watch.org/.
Here you will find a wealth of (eye-opening) information, including: “Google
as Big Brother”, and who-is-who in their organization.
I would like to suggest that you consider not using “Google”
as a verb or making any reference to this site – but rather consider
the proposed alternate. We all spend a lot of time and money eliminating
ad-ware and such – so why would we open the doors wide and invite
this one in? I am sure that you would not promote the use of “frequent
buyer” cards at the local grocery super-store; this is [in my humble
opinion] much worse.
You may say in reply that: “Google is the best and will
ensure Me of the best search”; however, the proposed alternate does,
in fact, use “Google” [and / or “Yahoo”] –
only it strips the transfer of ads and the possibility of generating user
profiles including personal information / preferences and tracking data.
For all searches they [Google] record the cookie ID, your
Internet IP address, the time and date, your search terms, and your browser
configuration. Increasingly, Google is customizing results based on your
IP number. This is referred to in the industry as "IP delivery based
on geolocation." This information is retained indefinitely! Google
hires spooks: A key Google engineer, used to work for the National Security
Agency. The newly-commissioned data-mining bureaucrats in Washington can
only dream about the sort of slick efficiency that Google has already achieved.
(See http://www.google-watch.org/bigbro.html
To use “Scroogle” you would have to first go to
their site [use a bookmark] and then enter your search keywords [rather
than just searching from the address area or the default search engine of
your browser (automatically configured by another “Big Brother”
for You).
The same goes for the installation of the “Google Search
Tool Bar” on Your personal browser. They already have a ~95% (monopoly)
on web-searching. This is the (proverbial) inviting of the “Wolf”
into the “Fold” scenario! And establishes an open link with
(Your hard drive and) the “Wolf Pack”!
In the worst case, I suggest that You use generic terms (in
your broadcasts) for the web-search function.
Please feel free to pass this on, and to use this information
on your broadcasts and web sites.
Best regards to all of You; Keep up the good fight!
If done in the public interest and not for profit, it's legal.
What's more, Google can't block you if they can't find you.
Public Information Research, Inc., the nonprofit public charity
behind http://www.google-watch.org
and http://www.scroogle.org, has been
running a Google proxy for more than two years. On January 3, 2005 we released
the source code for our proxy. Our review of the legal situation has convinced
us that we are covered by "fair use" under the Copyright Act.
This step that we have taken has implications for all search
engines. These engines crawl the public web without asking permission, and
cache and reproduce the content without asking permission, and then use
this information as a carrier for ads that generate private profit. We are
convinced that if citizens scrape Google and strip the ads, and make the
scraped results available as a nonprofit public service, that this is legal.
This is especially the case if there are public policy concerns behind the
scraping.
Google Watch has been the most prominent critic of Google's
outrageous privacy policies for more than two years. This is why we started
the proxy, and it's why we continue the proxy. We invite Google to serve
us with a cease and desist letter as a first step toward resolving this
issue. So far, we have yet to hear from Google's lawyers. By releasing the
source code for our proxy, we're trying to escalate the issue.
If it can be established that what we're doing is legal --
or at least sufficiently legal so that Google is not eager to challenge
us -- then this will begin to restore a public-interest balance to the web
that has been declining ever since big money got behind the dot-coms.
There is the additional problem of whether anyone who scrapes
Google can avoid getting blocked by Google. We experienced this when Google
blocked Scroogle in December, 2003. We moved to a different server and continued
as before, because Google could no longer find us. In our opinion, it's
legal for Google to block whomever they want, even while it's also legal
for us to scrape them if we can.
If the scraping is done properly, it is not worth Google's
trouble to find you. Our source code separates the "fetch" portion
of program, which is done by curl or wget, from the searcher interface and
parsing of the fetched results. If the fetching is done by a server on a
different Class C address from the website that shows the scraped results,
there is little that Google can do to find the IP address that is responsible
for the actual fetch.
*****
A Google block requires a John Doe server
Google uses a couple dozen data centers with dedicated IP
addresses. A number of these are located outside the U.S. Once these addresses
are discovered (search for "google data centers"), it is trivial
to maintain the list. The addresses will change over time, but they won't
change that quickly.
If a scraper is coming into Google from an address that is
outside the local IP block where his public interface operates, we believe
that Google is currently ill-equipped to discover him. Yahoo, by contrast,
appears to have a more centralized system, and is able to throttle excessive
activity from a single IP. We saw only two IP addresses for Yahoo when our
Yahoo scraper was active. About two percent of our fetches were throttled.
Google, with a more distributed system, makes it easy for scrapers to distribute
their fetches across most of Google's data centers.
Setting up a John Doe fetch is quite easy. All you need are
CGI privileges on Mr. Doe's server. It's easiest to just share someone's
account. Dedicated IP hosting is best for this. There is no need for DNS
name service from Mr. Doe, and no lookup delays.
When you get a search request, instead of forking to one
of Google's IP addresses, you fork to Mr. Doe's CGI program. This program
on Mr. Doe's site is a subset of the source code already available. Mr.
Doe does the fetch from the list of Google IP addresses, and then immediately
spits out that same file back to you, and deletes the file. It all happens
without dropping the connection between your scraper and Mr. Doe. You parse
this file on your public site as if it arrived directly from Google. There
could easily be more than one Mr. Doe. Evil hackers could even use a network
of zombie PCs.
What would Google need to find Mr. Doe? This is guesswork,
but it seems that Google would need software at all of their data centers
that can be switched in or out in real time. This software would scan incoming
search terms. If there's a match with a secret term sent out on your proxy
by some Google undercover cop using your interface, then the software would
report back that this term was logged at such-and-such data center, from
such-and-such IP address. Now Google knows whom to block. They do have an
IP blocking capability across all data centers, but we suspect that they
don't yet have this sort of search-term interception and reporting capability.
The reason the software would have to be switchable is because this scanning
is CPU-intensive for Google, and it only needs to run on rare occasions.
If Google blocks us, we plan to take our Yahoo scraper out
of retirement within 24 hours as a substitute for Google's results, and
think about what we should do next. Yahoo's bloated interface requires four
times more bytes per fetch than Google's www.google.com/ie interface, and
this would be a sad day for us.
The worst-case scenario we can think of would involve a two-pronged attack
by Google. The first prong would be a legal effort by Google to stop us.
We welcome this, and believe that we can prevail even though our market
cap at PIR is somewhat less than Google's $50 billion. The second prong
would be to block us once again. Currently our proxy is doing the Google
fetch from the same Class C that our domains are on. This is an invitation
for a block; it would take Google about 20 minutes to identify our fetcher's
IP address.
The larger issue here is that the commercialization of the
web became possible only because tens of thousands of noncommercial sites
made the web interesting in the first place. All search engines should make
a stable, bare-bones, ad-free, easy-to-scrape version of their results available
for those who want to set up nonprofit repeaters. Even if it cuts into their
ad profits slightly, there's no easier way to give back some of what they
stole from us.
*****
We no longer recommend Firefox
Sold out We once had a link behind the image, to download
Firefox. But they sold their soul, and we no longer recommend them.
In June 2005, we read that a Silicon Valley blogger with
alleged insider information was reporting that the Mozilla Foundation was
raking in $30 million annually from their Google connection. To substantiate
this figure, we asked the tax-exempt Foundation for a copy of their Form
990. They are required by law to provide copies. We want the correct figure
for their 2004 Google income, and are also curious about whether they filed
a 990-T to pay taxes on this sum as "unrelated business income."
The Foundation tells us that they have filing extensions
that give them until November 2005 to file this form, and no information
is currently available. Various officers have declined to comment on their
Google income to reporters over the past several months. Their 2003 form
shows total revenue of $2.4 million from donations that helped Mozilla Foundation
get started, and that seems reasonable. But if we're talking about tens
of millions from Google in 2004, this changes the character of their operation
considerably.
On August 3, 2005, the Foundation announced that they are
restructuring by spinning off the Mozilla Corporation, a for-profit subsidiary.
This tends to confirm the rumors about tens of millions of dollars from
Google. We sent emails to Mr. Mitch Kapor and Ms. Mitchell Baker, the Chair
and President, asking for the two items that will appear on the Form 990
in November. It looks like the Foundation is buying time to get their legal
affairs in order, and we are not likely to get any answers.
Apparently the bulk of the money from Google is due to Mozilla's
agreement to make Google the default engine in the Firefox search box. When
a Firefox user clicks on an ad from a Google-box search, Mozilla gets a
cut of Google's profit. A couple of months ago it was discovered that Google
is also prefetching the top result for all searches done from the Google
search box. This means you end up with cookies from sites you never visit,
and much bandwidth is wasted in the process. Fortunately, you can disable
this "feature" by entering about:config in the address bar and
then scrolling down to network.prefetch-next and toggling it to false. You
can also change the default search box to any of nearly 2,000 plug-ins that
can be downloaded from Mozilla.
There are other Google connections in Firefox. If you enter
search terms in the location bar instead of a web URL address, Firefox goes
to Google and picks off the top link, and takes you directly to that site.
A surprising percentage of web surfers don't know the difference between
a location bar and a search box, which makes this is a major concession
to Google. If you try the same thing in Explorer, you get a search preview
from MSN, but you aren't sent directly to the top site. Microsoft's behavior
is less intrusive because it gives the user more options, and therefore
has less of an impact on traffic patterns. Google and Firefox are behaving
the way that Microsoft used to behave in the days when it forced manufacturers
to bundle certain software. This behavior is unacceptable.
We no longer feel good about linking to Firefox, even though
they present an alternative to Explorer that is open-source, more secure,
and generally more configurable. It seems to us that Mozilla Foundation
came to a fork in the road all Google-eyed, and chose the wrong path.
All information posted on this web site is
the opinion of the author and is provided for educational purposes only.
It is not to be construed as medical advice. Only a licensed medical doctor
can legally offer medical advice in the United States. Consult the healer
of your choice for medical care and advice.