The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

sitemap problem

Discussion in 'General Discussion' started by paulmulder, Dec 19, 2007.

  1. paulmulder

    paulmulder Member

    Joined:
    Jun 5, 2004
    Messages:
    7
    Likes Received:
    0
    Trophy Points:
    1
    I apologise in advance if I have chosen the wrong category but I wonder if anyone can help with a sitemap problem I have. When I use the option which makes use of a urllist.txt everything works fine.

    However I need to make a sitemap for a website which has way too many pages for me to be able to create a urllist first.

    I have adapted the google file, and have made a mistake herin. The problem is that I don't know where the mistake is.

    I am using:

    <?xml version="1.0" encoding="UTF-8"?>

    -->
    <directory path="/home/myusername/public_html" url=http://www.mydomainname.xyz />
    <directory
    path="/home/myusername/public_html"
    url="http://www.mydomainname.xyz"
    default_file="index.htm" />

    </site>

    The error I get is:

    Traceback (most recent call last):
    File "sitemap_gen.py", line 2199, in ?
    sitemap = CreateSitemapFromFile(flags['config'], suppress_notify)
    File "sitemap_gen.py", line 2147, in CreateSitemapFromFile
    xml.sax.parse(configpath, sitemap)
    File "/usr/local/lib/python2.4/xml/sax/__init__.py", line 33, in parse
    parser.parse(source)
    File "/usr/local/lib/python2.4/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
    File "/usr/local/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
    File "/usr/local/lib/python2.4/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
    File "/usr/local/lib/python2.4/xml/sax/expatreader.py", line 300, in start_element
    self._cont_handler.startElement(name, AttributesImpl(attrs))
    File "sitemap_gen.py", line 2031, in startElement
    self._inputs.append(InputDirectory(attributes, self._base_url))
    File "sitemap_gen.py", line 877, in __init__
    if not url.startswith(base_url):
    TypeError: expected a character buffer object

    Does anyone have any suggestions?
     
  2. capitalwest

    capitalwest Registered

    Joined:
    Jan 8, 2008
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    1
    Try this. I use this for generating sitemaps by getting the script to crawl all directories and files.

    Just change username and domain name to your own and edit the type of index file you're using (e.g. index.html index.php). Copy everything from <?xml to </site>

    <?xml version="1.0" encoding="UTF-8" ?>
    <!--
    sitemap_gen.py example configuration script

    This file specifies a set of sample input parameters for the
    sitemap_gen.py client.

    You should copy this file into "config.xml" and modify it for
    your server.


    *********************************************************

    -->
    <!--
    ** MODIFY **
    The "site" node describes your basic web site.

    Required attributes:
    base_url - the top-level URL of the site being mapped
    store_into - the webserver path to the desired output file.
    This should end in '.xml' or '.xml.gz'
    (the script will create this file)

    Optional attributes:
    verbose - an integer from 0 (quiet) to 3 (noisy) for
    how much diagnostic output the script gives
    suppress_search_engine_notify="1"
    - disables notifying search engines about the new map
    (same as the "testing" command-line argument.)
    default_encoding
    - names a character encoding to use for URLs and
    file paths. (Example: "UTF-8")


    -->
    <site base_url="http://www.yourdomain.xyz/" store_into="/home/yourusername/public_html/sitemap.xml" verbose="1">
    <!--
    ********************************************************
    INPUTS

    All the various nodes in this section control where the script
    looks to find URLs.

    MODIFY or DELETE these entries as appropriate for your server.
    *********************************************************

    -->
    <!--
    ** MODIFY **
    "directory" nodes tell the script to walk the file system
    and include all files and directories in the Sitemap.

    Required attributes:
    path - path to begin walking from
    url - URL equivalent of that path

    Optional attributes:
    default_file - name of the index or default file for directory URLs


    -->
    <directory path="/home/yourusername/public_html" url="http://www.yourdomain.xyz/" default_file="index.php" />
    <!--
    ********************************************************
    FILTERS

    Filters specify wild-card patterns that the script compares
    against all URLs it finds. Filters can be used to exclude
    certain URLs from your Sitemap, for instance if you have
    hidden content that you hope the search engines don't find.

    Filters can be either type="wildcard", which means standard
    path wildcards (* and ?) are used to compare against URLs,
    or type="regexp", which means regular expressions are used
    to compare.

    Filters are applied in the order specified in this file.

    An action="drop" filter causes exclusion of matching URLs.
    An action="pass" filter causes inclusion of matching URLs,
    shortcutting any other later filters that might also match.
    If no filter at all matches a URL, the URL will be included.
    Together you can build up fairly complex rules.

    The default action is "drop".
    The default type is "wildcard".

    You can MODIFY or DELETE these entries as appropriate for
    your site. However, unlike above, the example entries in
    this section are not contrived and may be useful to you as
    they are.
    *********************************************************

    -->
    <!-- Exclude URLs that end with a '~' (IE: emacs backup files)
    -->
    <filter action="drop" type="wildcard" pattern="*~" />
    <!-- Exclude URLs within UNIX-style hidden files or directories
    -->
    <filter action="drop" type="regexp" pattern="/\.[^/]*" />
    <!-- Exclude URLs that end with a '.xml' extension
    -->
    <filter action="drop" type="wildcard" pattern="*.xml" />
    </site>
     
Loading...

Share This Page