Google & Big Brother (Sep. 7, 2005)

Google & Big Brother

By Ken Adachi <Editor@educate-yourself.org>
http://educate-yourself.org/cn/googleandbigbrother07sep05.shtml
September 7, 2005
Thanks to this e-mail sent by George Z., I am now aware that using www.google.com for searches is probably not the wisest thing to do since they apparently extract information about you and TRACK the info you are searching about-undoubtedly to wind up in Big Brother data storage computers. After you read George's intro letter, you can read the articles from scroogle.org which follow. They explain the the technicalities involved in stripping tracking information from google searches. I'll be changing the Search page at my web site soon and replace the link to google with a link to another, non-tracking search engine.Here's a few to try:
1. Clusty.com (http://clusty.com/)
2. Google search, but without the Tracking (http://www.scroogle.org/cgi-bin/scraper.htm)
3. Yahoo search, but without the Tracking ( http://www.scroogle.org/scraper7.html)

Hi Ken,
Your site, (http://educate-yourself.org) can be considered a “public” site and the use of the built-in “Google Search” (from the search results page) could be considered in a different light. I will, however, never (knowingly) use this tool – not even from your site.
I still think that this use is potentially entrapping to the uninformed - as it implies that it is "OK" to install the Google toolbar on my personal machine or that the use of Google is "safe."

Granted, they now make disclosures (for the things that they previously got caught at) in the agreement You “sign” when installing the tool – but do You read these? – You should read very carefully (and understand the implications).

Thanks again Ken, for all you do

Best regards,

George Z.

*****
[Note to RBN (http://www.rbnlive.com/)]
Subject: The promotion of (personal data spider) Google on the various programs

Comments: Greetings to all of You! Permit me to begin by thanking all of You for the great fight! – The war is not yet won!

As background to the subject, I recommend that you visit http://scroogle.org/ and http://www.google-watch.org/. Here you will find a wealth of (eye-opening) information, including: “Google as Big Brother”, and who-is-who in their organization.

I would like to suggest that you consider not using “Google” as a verb or making any reference to this site – but rather consider the proposed alternate. We all spend a lot of time and money eliminating ad-ware and such – so why would we open the doors wide and invite this one in? I am sure that you would not promote the use of “frequent buyer” cards at the local grocery super-store; this is [in my humble opinion] much worse.

You may say in reply that: “Google is the best and will ensure Me of the best search”; however, the proposed alternate does, in fact, use “Google” [and / or “Yahoo”] – only it strips the transfer of ads and the possibility of generating user profiles including personal information / preferences and tracking data.

For all searches they [Google] record the cookie ID, your Internet IP address, the time and date, your search terms, and your browser configuration. Increasingly, Google is customizing results based on your IP number. This is referred to in the industry as "IP delivery based on geolocation." This information is retained indefinitely! Google hires spooks: A key Google engineer, used to work for the National Security Agency. The newly-commissioned data-mining bureaucrats in Washington can only dream about the sort of slick efficiency that Google has already achieved. (See http://www.google-watch.org/bigbro.html

To use “Scroogle” you would have to first go to their site [use a bookmark] and then enter your search keywords [rather than just searching from the address area or the default search engine of your browser (automatically configured by another “Big Brother” for You).

The same goes for the installation of the “Google Search Tool Bar” on Your personal browser. They already have a ~95% (monopoly) on web-searching. This is the (proverbial) inviting of the “Wolf” into the “Fold” scenario! And establishes an open link with (Your hard drive and) the “Wolf Pack”!

In the worst case, I suggest that You use generic terms (in your broadcasts) for the web-search function.

Please feel free to pass this on, and to use this information on your broadcasts and web sites.

Best regards to all of You; Keep up the good fight!

George

Scraping and ad-stripping Google's results

http://www.scroogle.org/gscrape.html

If done in the public interest and not for profit, it's legal. What's more, Google can't block you if they can't find you.

Public Information Research, Inc., the nonprofit public charity behind http://www.google-watch.org and http://www.scroogle.org, has been running a Google proxy for more than two years. On January 3, 2005 we released the source code for our proxy. Our review of the legal situation has convinced us that we are covered by "fair use" under the Copyright Act.

This step that we have taken has implications for all search engines. These engines crawl the public web without asking permission, and cache and reproduce the content without asking permission, and then use this information as a carrier for ads that generate private profit. We are convinced that if citizens scrape Google and strip the ads, and make the scraped results available as a nonprofit public service, that this is legal. This is especially the case if there are public policy concerns behind the scraping.

Google Watch has been the most prominent critic of Google's outrageous privacy policies for more than two years. This is why we started the proxy, and it's why we continue the proxy. We invite Google to serve us with a cease and desist letter as a first step toward resolving this issue. So far, we have yet to hear from Google's lawyers. By releasing the source code for our proxy, we're trying to escalate the issue.

If it can be established that what we're doing is legal -- or at least sufficiently legal so that Google is not eager to challenge us -- then this will begin to restore a public-interest balance to the web that has been declining ever since big money got behind the dot-coms.

There is the additional problem of whether anyone who scrapes Google can avoid getting blocked by Google. We experienced this when Google blocked Scroogle in December, 2003. We moved to a different server and continued as before, because Google could no longer find us. In our opinion, it's legal for Google to block whomever they want, even while it's also legal for us to scrape them if we can.

If the scraping is done properly, it is not worth Google's trouble to find you. Our source code separates the "fetch" portion of program, which is done by curl or wget, from the searcher interface and parsing of the fetched results. If the fetching is done by a server on a different Class C address from the website that shows the scraped results, there is little that Google can do to find the IP address that is responsible for the actual fetch.

*****

A Google block requires a John Doe server

Google uses a couple dozen data centers with dedicated IP addresses. A number of these are located outside the U.S. Once these addresses are discovered (search for "google data centers"), it is trivial to maintain the list. The addresses will change over time, but they won't change that quickly.

If a scraper is coming into Google from an address that is outside the local IP block where his public interface operates, we believe that Google is currently ill-equipped to discover him. Yahoo, by contrast, appears to have a more centralized system, and is able to throttle excessive activity from a single IP. We saw only two IP addresses for Yahoo when our Yahoo scraper was active. About two percent of our fetches were throttled. Google, with a more distributed system, makes it easy for scrapers to distribute their fetches across most of Google's data centers.

Setting up a John Doe fetch is quite easy. All you need are CGI privileges on Mr. Doe's server. It's easiest to just share someone's account. Dedicated IP hosting is best for this. There is no need for DNS name service from Mr. Doe, and no lookup delays.

When you get a search request, instead of forking to one of Google's IP addresses, you fork to Mr. Doe's CGI program. This program on Mr. Doe's site is a subset of the source code already available. Mr. Doe does the fetch from the list of Google IP addresses, and then immediately spits out that same file back to you, and deletes the file. It all happens without dropping the connection between your scraper and Mr. Doe. You parse this file on your public site as if it arrived directly from Google. There could easily be more than one Mr. Doe. Evil hackers could even use a network of zombie PCs.

What would Google need to find Mr. Doe? This is guesswork, but it seems that Google would need software at all of their data centers that can be switched in or out in real time. This software would scan incoming search terms. If there's a match with a secret term sent out on your proxy by some Google undercover cop using your interface, then the software would report back that this term was logged at such-and-such data center, from such-and-such IP address. Now Google knows whom to block. They do have an IP blocking capability across all data centers, but we suspect that they don't yet have this sort of search-term interception and reporting capability. The reason the software would have to be switchable is because this scanning is CPU-intensive for Google, and it only needs to run on rare occasions.

If Google blocks us, we plan to take our Yahoo scraper out of retirement within 24 hours as a substitute for Google's results, and think about what we should do next. Yahoo's bloated interface requires four times more bytes per fetch than Google's www.google.com/ie interface, and this would be a sad day for us.

The worst-case scenario we can think of would involve a two-pronged attack by Google. The first prong would be a legal effort by Google to stop us. We welcome this, and believe that we can prevail even though our market cap at PIR is somewhat less than Google's $50 billion. The second prong would be to block us once again. Currently our proxy is doing the Google fetch from the same Class C that our domains are on. This is an invitation for a block; it would take Google about 20 minutes to identify our fetcher's IP address.

The larger issue here is that the commercialization of the web became possible only because tens of thousands of noncommercial sites made the web interesting in the first place. All search engines should make a stable, bare-bones, ad-free, easy-to-scrape version of their results available for those who want to set up nonprofit repeaters. Even if it cuts into their ad profits slightly, there's no easier way to give back some of what they stole from us.

*****

We no longer recommend Firefox

Sold out We once had a link behind the image, to download Firefox. But they sold their soul, and we no longer recommend them.

In June 2005, we read that a Silicon Valley blogger with alleged insider information was reporting that the Mozilla Foundation was raking in $30 million annually from their Google connection. To substantiate this figure, we asked the tax-exempt Foundation for a copy of their Form 990. They are required by law to provide copies. We want the correct figure for their 2004 Google income, and are also curious about whether they filed a 990-T to pay taxes on this sum as "unrelated business income."

The Foundation tells us that they have filing extensions that give them until November 2005 to file this form, and no information is currently available. Various officers have declined to comment on their Google income to reporters over the past several months. Their 2003 form shows total revenue of $2.4 million from donations that helped Mozilla Foundation get started, and that seems reasonable. But if we're talking about tens of millions from Google in 2004, this changes the character of their operation considerably.

On August 3, 2005, the Foundation announced that they are restructuring by spinning off the Mozilla Corporation, a for-profit subsidiary. This tends to confirm the rumors about tens of millions of dollars from Google. We sent emails to Mr. Mitch Kapor and Ms. Mitchell Baker, the Chair and President, asking for the two items that will appear on the Form 990 in November. It looks like the Foundation is buying time to get their legal affairs in order, and we are not likely to get any answers.

Apparently the bulk of the money from Google is due to Mozilla's agreement to make Google the default engine in the Firefox search box. When a Firefox user clicks on an ad from a Google-box search, Mozilla gets a cut of Google's profit. A couple of months ago it was discovered that Google is also prefetching the top result for all searches done from the Google search box. This means you end up with cookies from sites you never visit, and much bandwidth is wasted in the process. Fortunately, you can disable this "feature" by entering about:config in the address bar and then scrolling down to network.prefetch-next and toggling it to false. You can also change the default search box to any of nearly 2,000 plug-ins that can be downloaded from Mozilla.

There are other Google connections in Firefox. If you enter search terms in the location bar instead of a web URL address, Firefox goes to Google and picks off the top link, and takes you directly to that site. A surprising percentage of web surfers don't know the difference between a location bar and a search box, which makes this is a major concession to Google. If you try the same thing in Explorer, you get a search preview from MSN, but you aren't sent directly to the top site. Microsoft's behavior is less intrusive because it gives the user more options, and therefore has less of an impact on traffic patterns. Google and Firefox are behaving the way that Microsoft used to behave in the days when it forced manufacturers to bundle certain software. This behavior is unacceptable.

We no longer feel good about linking to Firefox, even though they present an alternative to Explorer that is open-source, more secure, and generally more configurable. It seems to us that Mozilla Foundation came to a fork in the road all Google-eyed, and chose the wrong path.

Free Newsletter

Join the Educate-Yourself Discussion Forum

All information posted on this web site is the opinion of the author and is provided for educational purposes only. It is not to be construed as medical advice. Only a licensed medical doctor can legally offer medical advice in the United States. Consult the healer of your choice for medical care and advice.