Helpful Information
 
 
Category: Search Engines
do I need a site map?

My head is spinning about site maps and I wonder if we even need one. The issue is that on our site we have a few folders of documents that we want our Search function to access. (We're using Google Custom Search.) The documents do not have any links in or out. The only way our users will find them is by search. So can we submit the URLs of the folders to Google instead of making a site map? The contents of the folders will change, so we don't want to have to resubmit site maps all the time. And there are more than 500 pages, so the online site map generators don't work.

The site just launched so it is too soon to know for sure that Google is not going to find them on its own.

I'm groaning at the idea of buying, learning, and regularly using a site map generating program.

If your documents do not have links to them then there is going to be know way that Google can find them. :) Google indexes site by following links either on your site or from other sites that leads to files/documents on your site.

What type of documents are these? Are the HTML files or something else? Is there a reason you don't want to have links leading to them?

The easiest way to create a listing of files in a directory would be with a .htaccess file that you place in the directory where you want the list. Lets say you have a directory with images in it and you want a list of those image that are linked to the files themsleves. Say something like this: http://www.puppetsandstuff.com/banners/

I created that listing by placing this in a .htaccess file that is uploaded to that directory:

Options +Indexes
IndexIgnore *.html *.php
HeaderName /banners/HEADER
ReadmeName /banners/README
IndexOptions +SuppressDescription +SuppressHTMLPreamble

Let's go through line by line of that code.

Options +Indexes This tells Apache that you want a list of files created in the directory.
IndexIgnore *.html *.php This says I don't want it to list files that end in .html or .php
HeaderName /banners/HEADER This tells it that I have a file called HEADER that I want it to output before the listing.
ReadmeName /banners/README This tells it that I have a file called README that I want it to output after the listing.
IndexOptions +SuppressDescription +SuppressHTMLPreamble This limits the amount of information that is output with the listing since I wanted a very simple listing.

The last three lines really just have to do with how the page looks. I wanted to make this page match a bit my main site. Your actually useing mod_autoindex to do this that is part of Apache. There is more information on what you can do with this here: http://httpd.apache.org/docs/2.0/mod/mod_autoindex.html

What you might want to do is simply use this in your .htaccess file to start with and see if it fits your needs.

Options +Indexes

Shawn, thank you for that info. If I understand, with an .htaccess file we could allow people to see a list of all the documents in the folder, and the names would be clickable, which creates the links that Google needs in order to index our documents. (The reason the documents don't have any links is that they are contracts, still in .doc or .pdf format, not html files. There are way too many of them to convert and add links. We think they should get indexed by Google because that folder is pointed to in our custom search function.)

The actual page is http://www.ibew1245.com/PGEcontractLibrary.html

I have used "Google Custom Search" to point the first search box to a folder of documents, and it works. (For example, search for "hard hat" and you get 3 documents in results.)
But the second search box is not finding any of the pdf documents in the folder I pointed it to. (I know the pdfs are searchable, and the path is correct.)
So to use your idea, I could create an .htaccess file and upload that to the contracts folder, then add a link on my page that says "complete list of contracts" with the URL to the access file, and they would get a simple directory, like your example. This would at least give access to the documents and the list would always be current.

What do you think?

Kathy

Kathy,

More then likely the reason you get the three hits on the first search box is because there are links other someplace that lead to those files. I have not really dug into the Google Custom Search feature that much but the bottom line is that in order for Google to index the files to a directory on your site the must follow links to them. :)

So yes if you upload a .htaccess file to each directory that you want a listing in then create a link to that directory then you get the listing. You do not link to the actual .htaccess file just to the directory you have placed it in. http://www.ibew1245.com/ContractIndex/ a link like that would give you a listing if you had a .htaccess file in the ContractIndex directory.

If you wanted to place that list of links into the look of your site you could "split" the HTML code of contractindex-results.html between a HEADER and README and upload those to the same directory. You split the code right after:

<div id="mainContent-noSidebars" class="newsText">

Everything after that point you would put in the README file and that and everything above it you would put in the HEADER file. :) Hope that makes sense to you.

Sorry to keep posting. I just noticed that you have a partial listing of the docs on your site. That is how Google found those files. I checked the Letters of Agreement page which seem to be the pdf files and the links on that page are invalid. In other words they return a 404 error meaning the files are not where they are supposed to be. That could be why the PDF search does not work. Double check validity of links and the paths you are using.

Google Maps will allow you to add multiple points then print the map only. It will still create directions, but you can ignore tham and not print them.

Sorry to keep posting. I just noticed that you have a partial listing of the docs on your site. That is how Google found those files. I checked the Letters of Agreement page which seem to be the pdf files and the links on that page are invalid. In other words they return a 404 error meaning the files are not where they are supposed to be. That could be why the PDF search does not work. Double check validity of links and the paths you are using.

Shawn, thanks for spotting that page of bad links - those files were removed but they forgot about that page. That's fixed now.

The .htaccess idea is working GREAT on my testing site (www.kifergraphics.com/TESTS/) but is not working on the real site, which would be www.ibew1245/PGE-docs/. I can tell the path is correct because I can open the header (www.ibew1245.com/PGE-docs/HEADER.html) and the icons. We tried stripping out everything in the .htaccess file except:
Options +Indexes

but it still doesn't work. Do you have any idea why not?

Thanks for being so helpful. I think we are really close to getting this.

Kathy

Is ibew1245 a secondary domain that has been added to an account? Perhaps kifergraphics.com is the main site on that account? If so you need to edit the httpd.conf file for the account. See this thread for information on that process. http://forums.westhost.com/showthread.php?t=14050

No, it is not a subdomain, but that was a good thought. They are a separate domain also hosted by WestHost.

Is there something else that would block an .htaccess file?

So the domain that is not working is on a WestHost account by itself and the public root is located at /var/www/html? It is a WH 3.0 account right? Do double check the .htaccess file it self to make sure it has not typos or mistakes. What all do you have in the directory you are trying to do this in? There is nothing else that I can think that would be blocking the use of .htaccess. You could have shoot a ticket off to WestHost and ask them if they can see anything wrong. While technically this is not something they would support they could at least confirm perhaps if there is something they see in your account that would keep this from working.

Hooray, we got it working! It was just a glitch in the file name after all.

Here it is: http://www.ibew1245.com/PGE-docs/

Thanks so much for your patient help, Shawn. This .htaccess thing is a great tool to know about.

Looks good! May I suggest adding your full header with menu items and perhaps even your left and right column stuff to the page. That could be a big task so at the least you might want to consider a link back to your main site. There is a possibility that folks well find this page first via a search and you want them to be able to easily access the rest of the site. :)

Good idea, Shawn. I added the navigation, but the sidebars pushed my file list below them. That's okay, sidebars aren't needed on this page.

Thanks again for walking me through this. Everyone is pleased to have this document list available.

http://www.xml-sitemaps.com/ create your sitemap using this it really works good










privacy (GDPR)