The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Need Apache .htaccess rewrite help

Discussion in 'EasyApache' started by sneader, Apr 18, 2011.

  1. sneader

    sneader Well-Known Member

    Joined:
    Aug 21, 2003
    Messages:
    1,126
    Likes Received:
    21
    Trophy Points:
    38
    Location:
    La Crosse, WI
    cPanel Access Level:
    Root Administrator
    Googlebot is really loading one of my servers, hitting some strange URLs for one particular customer (poorly written shopping cart). A new cart is being investigated, meanwhile I thought we could simply try to catch these bad URLs and redirect them to the home page or something.

    However, the "gotcha" is that these are HTTPS URLs, and you cannot use {REQUEST_URI} on HTTPS.

    For example, here's a bad URL it's trying to hit:

    https://www.example.com/cart/https://www.example.com/cart/checkout/selectAddressshop/Blow-Out-Deal!-Extra-Loud-Alarm-Clock-with-Green-LED-3-for-19-99-Shipped.207Acer-KG-UXH1P-Dual-Band-VHF-Plus-200-MHZ-Handheld-220-Special!-129-95-Shipped-With-Programming-Cable-and-Software!.137shop/Accessories.23YT34010X3-SMA-FEMALE-to-UHF-female-Fits-Sony-and-more.221acer.info.htmlorder?returnPath=

    If this wasn't HTTPS, I'd do something like:

    RewriteEngine on
    RewriteCond %{HTTP_HOST} ^example.com$
    RewriteCond %{REQUEST_URI} ^/cart/https
    RewriteRule ^(.*)$ http://www.example.com/cart/$1 [R=301,L]

    The syntax may not be right, but what I'm trying to is say... if anyone tries to go to a URL that starts with /cart/https.... that is bogus and redirect them.

    But {REQUEST_URI} doesn't work with HTTPS.

    Any ideas, either to solve this, or where to go for a "consultant" to help figure out a workaround?

    - Scott
     
  2. bhd

    bhd Well-Known Member

    Joined:
    Sep 20, 2003
    Messages:
    149
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    JNB ZA
    cPanel Access Level:
    Root Administrator
    Google honors the robots.txt file as far as I know. Can't you put what you want in there?
     
  3. sneader

    sneader Well-Known Member

    Joined:
    Aug 21, 2003
    Messages:
    1,126
    Likes Received:
    21
    Trophy Points:
    38
    Location:
    La Crosse, WI
    cPanel Access Level:
    Root Administrator
    Well, the problem with using robots.txt is that you'd have to enter something like:

    Disallow: /cart/https://www.example.com/

    I'll try it, but it just doesn't look like something it will understand, does it?

    - Scott
     
  4. sneader

    sneader Well-Known Member

    Joined:
    Aug 21, 2003
    Messages:
    1,126
    Likes Received:
    21
    Trophy Points:
    38
    Location:
    La Crosse, WI
    cPanel Access Level:
    Root Administrator
    In Webmaster Tools, you can test your robots.txt file. I have "Disallow: /cart/https://www.example.com/" in robots.txt. When I feed the tester this URL:

    It says "Not in Domain".

    When I feed it the same URL, but I change the beginning from https to http, then it says "blocked by robots.txt"

    So... I'm sunk. It appears there is NO WAY to control what Google spiders, if it decides to use HTTPS to hit your site.

    That just seems wrong. How do I contact this Matt Cutts guy? :)

    - Scott
     
Loading...

Share This Page