During one of my previous jobs, I was tasked with putting up a web
page where we could post up job openings. The person making the request
didn't want the page indexed on search engines because they didn't want
random people applying for the jobs posted there. They wanted to be able
to hand out the web address to others so they can get the information
on the job and apply. Easy enough, create a robots.txt file, add a
<meta> tag that states that the page shouldn't be indexed and
viola....finished.
A similar thing was done before I
started work there. One of the websites was telling the google bots to
not crawl or index it. However a link to that website existed on the
main website. If you googled the name of the website that wasn't to be
indexed, you got a link to the page that had a link to it. Well, the
department that set it up was annoyed by this, and decided to tell us
that they were going to make it indexable by google since people could
find it anyways through this method. What?!?
I guess
the other department was told that the site should not be able to be
found by anyone other than those who needed to go to it. This makes
sense, you only need certain people to go to it, so you should try to
reduce the amount of people visiting it to reduce bandwidth costs (I'm
not sure if that's what they were thinking about, but it sounds solid to
me). The other department took it as that the website shouldn't be
found at all by searching google. That's a huge misunderstanding, not of
what my department wanted, but of what is really actually achievable
and possible. Any website that is put onto the Internet that has no log
in feature is out in the public. Any site like this can be "found", you
just have to play the world's largest game of Where's Waldo? to find it.
(Also the site with a log in feature can be found as well, you just
can't get inside the website....in theory.) The reason for this is that
you really have no control over the Internet, the other people on the
Internet, or how the Internet works.
Yes, you can
request that search engines do not crawl or index your website using a
robots.txt file or by using the <meta name="robots" content="noindex"
/> tag. However, your are only ASKING the crawler to not index your
site. Generally, they comply with the request, however I'm sure that
somewhere out there, there is a search engine that ignores these
requests and just indexes everything anyways. As a
webmaster/developer/designer, you don't control the behavior of things
that aren't yours. Another thing you can't control is human behavior.
Anyone can link to your "hidden" website on any page they control, or
any page that will let them do so, creating the situation that I
described above.
The situation where the website was
found through another webpage is not as bad a thing as the other
department seemed to make it out to be. The only people who are going to
get to the website through that route already know that the website
exists. How else did they know exactly what search term to put into
google to find it? If they know it exists, that it is reasonable to
assume that they need to get to the site, or have been to it already. If
you hand somebody the web address, you can't control the fact that they
decided to visit it 200 times a day either.
If you
absolutely need the information on the website to not be found on the
Internet at all, that site needs a log in function, or even better,
don't create a website for it all, just send out emails to those who
need it (if the audience is small enough and the list of email addresses
is manageable enough).
I reserve the right to be wrong on this subject, and I appreciate any and all comments about it.
No comments:
Post a Comment