Changeset 16245
- Timestamp:
- 04/25/08 16:38:19 (2 months ago)
- Files:
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
branches/MidCOM_2_8/fi.hut.staticdumps/bin/dump_sites.php
r16188 r16245 3 3 ini_set('error_reporting', E_ALL); 4 4 5 $wget_options = "-erobots=off -q -m -nH"; 6 $rsync_options = '-a'; 7 $http_timeout = 300; // seconds = 5minutes 8 $lockfile_path = '/var/run'; 9 $lockfile_prefix = 'fi_hut_staticdumps_'; 5 $defaults = array 6 ( 7 'wget_options' => '-erobots=off -q -m -nH', 8 'rsync_options' => '-a', 9 'http_timeout' => 300, // seconds = 5minutes 10 'lockfile_path' => '/var/run', 11 'lockfile_prefix' => 'fi_hut_staticdumps_', 12 ); 10 13 11 14 function better_die($msg) … … 98 101 foreach ($sites_config as $k => $site_config) 99 102 { 103 foreach ($defaults as $key => $val) 104 { 105 if (isset($site_config[$key])) 106 { 107 $$key = $site_config[$key]; 108 } 109 else 110 { 111 $$key = $val; 112 } 113 } 100 114 if (!isset($site_config['url'])) 101 115 { … … 197 211 foreach($output as $filepath) 198 212 { 213 if (preg_match('/\.orig$/', $filepath)) 214 { 215 // Skip wget --keep-originals .orig files from rename 216 continue; 217 } 199 218 list($filepart, $querypart) = explode('?', $filepath); 200 219 $newpath = dirname($filepart) . "/{$querypart}_" . basename($filepart); branches/MidCOM_2_8/fi.hut.staticdumps/documentation/USAGE
r15931 r16245 18 18 'post_dump_script => '', // optional, the url is passed as argument along with general status indicator exit codes some certain prior operations and dump path are passed as arguments 19 19 ), 20 21 Also if you wish to override some of the more basic settings there are the following config keys and their default values: 22 23 'wget_options' => '-erobots=off -q -m -nH', 24 'rsync_options' => '-a', 25 'http_timeout' => 300, // seconds = 5minutes 26 'lockfile_path' => '/var/run', 27 'lockfile_prefix' => 'fi_hut_staticdumps_', 28 29 The command used to execute `wget` is formed from the wget_options internal default concatenated with the `wget_extra_options` (if defined), ditto for `rsync`. `http_timeout` is used when querying for protected/redirection folder list. If you have multiple nodes sharing the load of dumping a ton of sites `lockfile_path` should point to a shared directory they all can write to, `lockfile_prefix` is configurable for completeness sake. 20 30 21 31 In the VirtualHost directive of your static apache set the following to handle URLs with GET parameters in them nicely:
