general/misc/

file_http_copy.pro

WARNING: the interface to this routine is not yet solidified. Use the wrapper routine: file_retrieve instead. This routine is still under development. NAME: file_http_copy PURPOSE: Downloads file(s) from http servers. Also performs Searches without download. Copies the file to a user specified local directory. By default, files are only downloaded if the remote file is newer than the local file (based on mtime) or if the files differ in size. This routine is intended for use with simple HTTP file servers. Wildcard matching and recursive file searching can be used as well. CALLING SEQUENCE: There are two methods: Method 1: FILE_HTTP_COPY, pathnames, SERVERDIR=serverdir, LOCALDIR=localdir where: pathnames = (input string(s), scalar or array) Relative path name of file to download.; serverdir = (scalar input string) Root name of source URL, must begin with: 'http://' and end with '/' localdir = (scalar input string) Root name of local directory, typically ends with '/' Note: The source is at: serverdir + pathnames The destination is: localdir + pathnames Method 2: FILE_HTTP_COPY, URL URL = full URL(S) of source file Directory structure is not retained with this procedure Example: FILE_HTTP_COPY, 'ssl_general/misc/file_http_copy.pro', $ SERVERDIR='http://themis.ssl.berkeley.edu/data/themis/socware/bleeding_edge/idl/' $ localdir = 'myidl/' Note: Unix style directory separaters '/' should be used throughout. This convention will still work with WINDOWS. Alternate calling sequence: FILE_HTTP_COPY,URL where URL is an input (string) such as: URL = ' INPUTS: URL - scalar or array string giving a fully qualified url OPTIONAL KEYWORDS: NO_CLOBBER: (0/1) Set this keyword to prevent overwriting local files. NO_UPDATE: (0/1) Set this keyword to prevent contacting the remote server to update existing files. Ignored with directory lists IGNORE_FILESIZE: (0/1) Set this keyword to ignore file size when evaluating need to download. NO_DOWNLOAD: (0/1,2) Set this keyword to prevent file downloads (url_info is still returned) URL_INFO=url_info: (output) Named variable that returns information about remote file such as modification time and file size as determined from the HTML header. A zero is returned if the remote file is invalid. FILE_MODE= file_mode: If non-zero, sets the permissions for downloaded files. DIR_MODE = dir_mode: Sets permissions for newly created directories (Useful for shared directories) ASCII_MODE: (0/1) When set to 1 it forces files to be downloaded as ascii text files (converts CR/LF) Setting this keyword will force ignore_filesize keyword to be set as well because files will be of different sizes typically. USER_PASS: string with format: 'user:password' for sites that require Basic authentication. Digest authentication is not supported. VERBOSE: (input; integer) Set level of verboseness: Uses "DPRINT" 0-nearly silent; 2-typical messages; 4: debugging info PRESERVE_MTIME: Uses the server modification time instead of local modification time. This keyword is ignored on windows machines that don't have touch installed. (No cygwin or GNU utils) Note: The PRESERVE_MTIME option is experimental and highly platform dependent. Behavior may change in future releases, so use with caution. Examples: ;Download most recent version of this file to current directory: FILE_HTTP_COPY,'http://themis.ssl.berkeley.edu/data/themis/socware/bleeding_edge/idl/ssl_general/misc/file_http_copy.pro' OPTIONAL INPUT KEYWORD PARAMETERS: PATHNAME = pathname ; pathname is the filename to be created. If the directory does not exist then it will be created. If PATHNAME does not exist then the original filename is used and placed in the current directory. ; RESTRICTIONS: PROXY: If you are behind a firewall and have to access the net through a Web proxy, set the environment variable 'http_proxy' to point to your proxy server and port, e.g. setenv, 'http_proxy=http://web-proxy.mpia-hd.mpg.de:3128' setenv, 'http_proxy=http://www-proxy1.external.lmco.com' The URL *MUST* begin with "http://". PROCEDURE: Open a socket to the webserver and download the header. EXPLANATION: FILE_HTTP_COPY can access http servers - even from behind a firewall - and perform simple downloads. Currently, Requires IDL V5.4 or later on Unix or Windows, V5.6 on Macintosh EXAMPLE: IDL> FILE_HTTP_COPY,'http://themis.ssl.berkeley.edu/themisdata/thg/l1/asi/whit/2006/thg_l1_asf_whit_2006010103_v01.cdf' IDL> PRINTDAT, file_info('thg_l1_asf_whit_2006010103_v01.cdf') or MINIMUM IDL VERSION: V5.4 (uses SOCKET) MODIFICATION HISTORY: Original version: WEBGET() Written by M. Feldt, Heidelberg, Oct 2001 Use /swap_if_little_endian keyword to SOCKET W. Landsman August 2002 Less restrictive search on Content-Type W. Landsman April 2003 Modified to work with FIRST image server- A. Barth, Nov 2006 FILE_HTTP_COPY: New version created by D Larson: March 2007. Checks last modification time of remote file to determine need for download Checks size of remote file to determine need to download Very heavily modified from WEBGET(): May/June 2007 - Modified to allow file globbing (wildcards). July 2007 - Modified to return remote file info (without download) July 2007 - Modified to allow recursive searches. August 2007 - Added file_mode keyword. April 2008 - Added dir_mode keyword Sep 2009 - Fixed user-agent $LastChangedBy: davin-mac $ $LastChangedDate: 2014-12-06 11:08:02 -0800 (Sat, 06 Dec 2014) $ $LastChangedRevision: 16363 $ $URL: svn+ssh://thmsvn@ambrosia.ssl.berkeley.edu/repos/spdsoft/trunk/general/misc/file_http_copy.pro $

Routines

Routines from file_http_copy.pro

result = encode_url(urln)
result = compare_urls(url1, url2)
extract_html_links, s, links, relative=relative, normal=normal
extract_html_links_regex, s, links, relative=relative, normal=normal, no_parent_links=no_parent_links
result = file_extract_html_links(filename, count, verbose=verbose, no_parent_links=no_parent_links)
result = file_http_strip_domain(s)
result = file_http_is_parent_dir(current, link)
result = file_http_header_element(header, name)
file_http_header_info, Header, hi, verbose=verbose
file_http_copy, pathnames, newpathnames, recurse_limit=recurse_limit, verbose=verbose, serverdir=serverdir, localdir=localdir, localnames=localnames, file_mode=file_mode, dir_mode=dir_mode, last_version=last_version, min_age_limit=min_age_limit, host=host, user_agent=user_agent, user_pass=user_pass, preserve_mtime=preserve_mtime, restore_mtime=restore_mtime, if_modified_since=if_modified_since, ascii_mode=ascii_mode, no_globbing=no_globbing, no_clobber=no_clobber, archive_ext=archive_ext, archive_dir=archive_dir, no_update=no_update, no_download=no_download, ignore_filesize=ignore_filesize, ignore_filedate=ignore_filedate, url_info=url_info, progobj=progobj, links=links, force_download=force_download, error=error

Routine details

top source encode_url

result = encode_url(urln)

Parameters

urln

top source compare_urls

result = compare_urls(url1, url2)

Parameters

url1
url2

top source extract_html_links

extract_html_links, s, links, relative=relative, normal=normal

Parameters

s
links

Keywords

relative
normal

top source extract_html_links_regex

extract_html_links_regex, s, links, relative=relative, normal=normal, no_parent_links=no_parent_links

Parameters

s
links

Keywords

relative
normal
no_parent_links

top source file_extract_html_links

result = file_extract_html_links(filename, count, verbose=verbose, no_parent_links=no_parent_links)

Parameters

filename
count

Keywords

verbose
no_parent_links

top source file_http_strip_domain

result = file_http_strip_domain(s)

Parameters

s

top source file_http_is_parent_dir

result = file_http_is_parent_dir(current, link)

Parameters

current
link

top source file_http_header_element

result = file_http_header_element(header, name)

Parameters

header
name

top source file_http_header_info

file_http_header_info, Header, hi, verbose=verbose

Parameters

Header
hi

Keywords

verbose

top source file_http_copy

file_http_copy, pathnames, newpathnames, recurse_limit=recurse_limit, verbose=verbose, serverdir=serverdir, localdir=localdir, localnames=localnames, file_mode=file_mode, dir_mode=dir_mode, last_version=last_version, min_age_limit=min_age_limit, host=host, user_agent=user_agent, user_pass=user_pass, preserve_mtime=preserve_mtime, restore_mtime=restore_mtime, if_modified_since=if_modified_since, ascii_mode=ascii_mode, no_globbing=no_globbing, no_clobber=no_clobber, archive_ext=archive_ext, archive_dir=archive_dir, no_update=no_update, no_download=no_download, ignore_filesize=ignore_filesize, ignore_filedate=ignore_filedate, url_info=url_info, progobj=progobj, links=links, force_download=force_download, error=error

Parameters

pathnames
newpathnames

Keywords

recurse_limit
verbose
serverdir
localdir
localnames
file_mode
dir_mode
last_version
min_age_limit
host
user_agent
user_pass
preserve_mtime
restore_mtime
if_modified_since
ascii_mode
no_globbing
no_clobber
archive_ext
archive_dir
no_update
no_download
ignore_filesize
ignore_filedate
url_info
progobj
links
force_download
error

File attributes

Modification date:	Sat Dec 6 11:10:08 2014
Lines:	568

Documentation for /home/davin/idl/socware/

Generated by IDLdoc

general/misc/

file_http_copy.pro

Routines

Routines from file_http_copy.pro

Routine details

top source encode_url

Parameters

top source compare_urls

Parameters

top source extract_html_links

Parameters

Keywords

top source extract_html_links_regex

Parameters

Keywords

top source file_extract_html_links

Parameters

Keywords

top source file_http_strip_domain

Parameters

top source file_http_is_parent_dir

Parameters

top source file_http_header_element

Parameters

top source file_http_header_info

Parameters

Keywords

top source file_http_copy

Parameters

Keywords

File attributes