Waldo's Webpage

During one of my previous jobs, I was tasked with putting up a web page where we could post up job openings. The person making the request didn't want the page indexed on search engines because they didn't want random people applying for the jobs posted there. They wanted to be able to hand out the web address to others so they can get the information on the job and apply. Easy enough, create a robots.txt file,  add a <meta> tag that states that the page shouldn't be indexed and viola....finished.

A similar thing was done before I started work there. One of the websites was telling the google bots to not crawl or index it. However a link to that website existed on the main website. If you googled the name of the website that wasn't to be indexed, you got a link to the page that had a link to it. Well, the department that set it up was annoyed by this, and decided to tell us that they were going to make it indexable by google since people could find it anyways through this method. What?!?

I guess the other department was told that the site should not be able to be found by anyone other than those who needed to go to it. This makes sense, you only need certain people to go to it, so you should try to reduce the amount of people visiting it to reduce bandwidth costs (I'm not sure if that's what they were thinking about, but it sounds solid to me). The other department took it as that the website shouldn't be found at all by searching google. That's a huge misunderstanding, not of what my department wanted, but of what is really actually achievable and possible. Any website that is put onto the Internet that has no log in feature is out in the public. Any site like this can be "found", you just have to play the world's largest game of Where's Waldo? to find it. (Also the site with a log in feature can be found as well, you just can't get inside the website....in theory.) The reason for this is that you really have no control over the Internet, the other people on the Internet, or how the Internet works.

Yes, you can request that search engines do not crawl or index your website using a robots.txt file or by using the <meta name="robots" content="noindex" /> tag. However, your are only ASKING the crawler to not index your site. Generally, they comply with the request, however I'm sure that somewhere out there, there is a search engine that ignores these requests and just indexes everything anyways. As a webmaster/developer/designer, you don't control the behavior of things that aren't yours. Another thing you can't control is human behavior. Anyone can link to your "hidden" website on any page they control, or any page that will let them do so, creating the situation that I described above.

The situation where the website was found through another webpage is not as bad a thing as the other department seemed to make it out to be. The only people who are going to get to the website through that route already know that the website exists. How else did they know exactly what search term to put into google to find it? If they know it exists, that it is reasonable to assume that they need to get to the site, or have been to it already. If you hand somebody the web address, you can't control the fact that they decided to visit it 200 times a day either.

If you absolutely need the information on the website to not be found on the Internet at all, that site needs a log in function, or even better, don't create a website for it all, just send out emails to those who need it (if the audience is small enough and the list of email addresses is manageable enough).

I reserve the right to be wrong on this subject, and I appreciate any and all comments about it.

No comments:

Post a Comment