{"id":132,"date":"2013-01-13T14:05:09","date_gmt":"2013-01-13T13:05:09","guid":{"rendered":"http:\/\/blog.jeremm.fr\/?p=132"},"modified":"2014-09-24T21:13:07","modified_gmt":"2014-09-24T19:13:07","slug":"charger-vos-sites-dans-le-cache-varnish-avec-le-sitemap","status":"publish","type":"post","link":"https:\/\/blog.jeremm.fr\/?p=132","title":{"rendered":"Charger vos sites dans le cache Varnish avec le sitemap (crawler)"},"content":{"rendered":"<p>Si vos sites ne sont pas parcourus r\u00e9guli\u00e8rement, les pages ne seront plus dans votre cache varnish, vous pourriez augmenter le ttl de vos pages mais si vous mettez \u00e0 jour r\u00e9guli\u00e8rement vos sites cela peut \u00eatre emb\u00eatant.<br \/>\nLa solution, cr\u00e9er un cron qui parcours votre site avec ce script :<\/p>\n<pre lang=\"bash\">#!\/bin\/bash\r\n\r\nSitemap='www.exemple.com\/sitemap.xml'\r\nSite='www.exemple.com'\r\nwget='\/usr\/bin\/wget'\r\ngrep='\/bin\/grep'\r\nawk='\/usr\/bin\/awk'\r\nfind='\/usr\/bin\/find'\r\nmount='\/bin\/mount'\r\numount='\/bin\/umount'\r\nsed='\/bin\/sed'\r\nwc='\/usr\/bin\/wc'\r\ncut='\/usr\/bin\/cut'\r\n\r\nif [ `$mount -l | $grep \/tmp\/wget | $wc -l` -gt 0 ] ; then exit 1 ; fi\r\n\r\ntouch \/tmp\/cookies.txt\r\n$wget -q -O \/dev\/null \"URL pour r\u00e9g\u00e9n\u00e9rer le sitemap\"\r\n\r\nif [ ! -d \/tmp\/wget ] ; then\r\n   mkdir \/tmp\/wget\r\nfi\r\n\r\n$mount -t tmpfs -o size=25M tmpfs \/tmp\/wget\r\ncd \/tmp\/wget\r\n\r\nfor site in $Sitemap ; do\r\n   for url in `$wget -q -O - \"http:\/\/$site\" | $grep '<loc>'  | $awk -F'<loc>' '{print $2}' | $awk -F'<\/loc>' '{print $1}'` ; do\r\n      $wget -q -p --save-cookies \/tmp\/cookies.txt --load-cookies \/tmp\/cookies.txt \"$url\"\r\n      for index in `$find . -type f | $sed 's:^.\/::' | $grep index.html` ; do\r\n            for href in `$grep href \"$index\" | $cut -d\"'\" -f2 | $cut -d'\"' -f2 | $grep http | $grep \"$Site\"` ; do\r\n                  $wget -q -p --save-cookies \/tmp\/cookies.txt --load-cookies \/tmp\/cookies.txt --header='Accept-Encoding: gzip, deflate' \"$href\"\r\n                  $wget -q -p --header='Accept-Encoding: gzip, deflate' \"$href\"\r\n            done\r\n      done\r\n      for file in `$find . -type f | $sed 's:^.\/::' | $grep -v index.html` ; do\r\n          $wget -q -O \/dev\/null  --save-cookies \/tmp\/cookies.txt --load-cookies \/tmp\/cookies.txt --header='Accept-Encoding: gzip, deflate' \"http:\/\/$file\"\r\n          $wget -q -O \/dev\/null  --header='Accept-Encoding: gzip, deflate' \"http:\/\/$file\"\r\n      done\r\n      for file in `$find . -type f | $sed 's:^.\/::' | $grep index.html | $sed 's:index.html::'` ; do\r\n          $wget -q -O \/dev\/null  --save-cookies \/tmp\/cookies.txt --load-cookies \/tmp\/cookies.txt --header='Accept-Encoding: gzip, deflate' \"http:\/\/$file\"\r\n          $wget -q -O \/dev\/null  --header='Accept-Encoding: gzip, deflate' \"http:\/\/$file\"\r\n      done\r\n      rm -rf \/tmp\/wget\/*\r\n   done\r\ndone\r\nrm \/tmp\/cookies.txt\r\ncd \/tmp\/ ; $umount \/tmp\/wget ; rm -rf \/tmp\/wget<\/pre>\n<p>Maj 13\/01\/12 : Ajout href<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Si vos sites ne sont pas parcourus r\u00e9guli\u00e8rement, les pages ne seront plus dans votre cache varnish, vous pourriez augmenter le ttl de vos pages mais si vous mettez \u00e0 jour r\u00e9guli\u00e8rement vos sites cela peut \u00eatre emb\u00eatant. La solution, cr\u00e9er un cron qui parcours votre site avec ce script : #!\/bin\/bash Sitemap=&rsquo;www.exemple.com\/sitemap.xml&rsquo; Site=&rsquo;www.exemple.com&rsquo; wget=&rsquo;\/usr\/bin\/wget&rsquo; <a href='https:\/\/blog.jeremm.fr\/?p=132' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[35,36,9],"_links":{"self":[{"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=\/wp\/v2\/posts\/132"}],"collection":[{"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=132"}],"version-history":[{"count":33,"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=\/wp\/v2\/posts\/132\/revisions"}],"predecessor-version":[{"id":886,"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=\/wp\/v2\/posts\/132\/revisions\/886"}],"wp:attachment":[{"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=132"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=132"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.jeremm.fr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=132"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}