general/misc/
file_http_copy.pro
WARNING: the interface to this routine is not yet solidified. Use the wrapper routine:
file_retrieve instead. This routine is still under development.
NAME:
file_http_copy
PURPOSE:
Downloads file(s) from http servers.
Also performs Searches without download.
Copies the file to a user specified local directory.
By default, files are only downloaded if the remote file is newer than
the local file (based on mtime) or if the files differ in size.
This routine is intended for use with simple HTTP file servers.
Wildcard matching and recursive file searching can be used as well.
CALLING SEQUENCE: There are two methods:
Method 1:
FILE_HTTP_COPY, pathnames, SERVERDIR=serverdir, LOCALDIR=localdir
where:
pathnames = (input string(s), scalar or array) Relative path name of file to download.;
serverdir = (scalar input string) Root name of source URL, must
begin with: 'http://' and end with '/'
localdir = (scalar input string) Root name of local directory, typically
ends with '/'
Note: The source is at: serverdir + pathnames
The destination is: localdir + pathnames
Method 2:
FILE_HTTP_COPY, URL
URL = full URL(S) of source file
Directory structure is not retained with this procedure
Example:
FILE_HTTP_COPY, 'ssl_general/misc/file_http_copy.pro', $
SERVERDIR='http://themis.ssl.berkeley.edu/data/themis/socware/bleeding_edge/idl/' $
localdir = 'myidl/'
Note: Unix style directory separaters '/' should be used throughout. This convention will still
work with WINDOWS.
Alternate calling sequence:
FILE_HTTP_COPY,URL
where URL is an input (string) such as: URL = '
INPUTS:
URL - scalar or array string giving a fully qualified url
OPTIONAL KEYWORDS:
NO_CLOBBER: (0/1) Set this keyword to prevent overwriting local files.
NO_UPDATE: (0/1) Set this keyword to prevent contacting the remote server to update existing files. Ignored with directory lists
IGNORE_FILESIZE: (0/1) Set this keyword to ignore file size when
evaluating need to download.
NO_DOWNLOAD: (0/1,2) Set this keyword to prevent file downloads (url_info
is still returned)
URL_INFO=url_info: (output) Named variable that returns information about
remote file such as modification time and file size as determined
from the HTML header. A zero is returned if the remote file is
invalid.
FILE_MODE= file_mode: If non-zero, sets the permissions for downloaded files.
DIR_MODE = dir_mode: Sets permissions for newly created directories
(Useful for shared directories)
ASCII_MODE: (0/1) When set to 1 it forces files to be downloaded as ascii text files (converts CR/LF)
Setting this keyword will force ignore_filesize keyword to be set as well because
files will be of different sizes typically.
USER_PASS: string with format: 'user:password' for sites that require Basic authentication. Digest authentication is not supported.
VERBOSE: (input; integer) Set level of verboseness: Uses "DPRINT"
0-nearly silent; 2-typical messages; 4: debugging info
PRESERVE_MTIME: Uses the server modification time instead of local modification time. This keyword is ignored
on windows machines that don't have touch installed. (No cygwin or GNU utils)
Note: The PRESERVE_MTIME option is experimental and highly platform
dependent. Behavior may change in future releases, so use with
caution.
Examples:
;Download most recent version of this file to current directory:
FILE_HTTP_COPY,'http://themis.ssl.berkeley.edu/data/themis/socware/bleeding_edge/idl/ssl_general/misc/file_http_copy.pro'
OPTIONAL INPUT KEYWORD PARAMETERS:
PATHNAME = pathname ; pathname is the filename to be created.
If the directory does not exist then it will be created.
If PATHNAME does not exist then the original filename is used
and placed in the current directory.
;
RESTRICTIONS:
PROXY: If you are behind a firewall and have to access the net through a
Web proxy, set the environment variable 'http_proxy' to point to
your proxy server and port, e.g.
setenv, 'http_proxy=http://web-proxy.mpia-hd.mpg.de:3128'
setenv, 'http_proxy=http://www-proxy1.external.lmco.com'
The URL *MUST* begin with "http://".
PROCEDURE:
Open a socket to the webserver and download the header.
EXPLANATION:
FILE_HTTP_COPY can access http servers - even from behind a firewall -
and perform simple downloads. Currently,
Requires IDL V5.4 or later on Unix or Windows, V5.6 on
Macintosh
EXAMPLE:
IDL> FILE_HTTP_COPY,'http://themis.ssl.berkeley.edu/themisdata/thg/l1/asi/whit/2006/thg_l1_asf_whit_2006010103_v01.cdf'
IDL> PRINTDAT, file_info('thg_l1_asf_whit_2006010103_v01.cdf')
or
MINIMUM IDL VERSION:
V5.4 (uses SOCKET)
MODIFICATION HISTORY:
Original version: WEBGET()
Written by M. Feldt, Heidelberg, Oct 2001
Routines
Routines from file_http_copy.pro
result = encode_url(urln)
result = compare_urls(url1, url2)
extract_html_links, s, links, relative=relative, normal=normal
extract_html_links_regex, s, links, relative=relative, normal=normal, no_parent_links=no_parent_links
result = file_extract_html_links(filename, count, verbose=verbose, no_parent_links=no_parent_links)
result = file_http_strip_domain(s)
result = file_http_is_parent_dir(current, link)
result = file_http_header_element(header, name)
file_http_header_info, Header, hi, verbose=verbose
file_http_copy, pathnames, newpathnames, recurse_limit=recurse_limit, verbose=verbose, serverdir=serverdir, localdir=localdir, localnames=localnames, file_mode=file_mode, dir_mode=dir_mode, last_version=last_version, min_age_limit=min_age_limit, host=host, user_agent=user_agent, user_pass=user_pass, preserve_mtime=preserve_mtime, restore_mtime=restore_mtime, if_modified_since=if_modified_since, ascii_mode=ascii_mode, no_globbing=no_globbing, no_clobber=no_clobber, archive_ext=archive_ext, archive_dir=archive_dir, no_update=no_update, no_download=no_download, ignore_filesize=ignore_filesize, ignore_filedate=ignore_filedate, url_info=url_info, progobj=progobj, links=links, force_download=force_download, error=error
Routine details
top source extract_html_links
extract_html_links, s, links, relative=relative, normal=normal
Parameters
- s
- links
Keywords
- relative
- normal
top source extract_html_links_regex
extract_html_links_regex, s, links, relative=relative, normal=normal, no_parent_links=no_parent_links
Parameters
- s
- links
Keywords
- relative
- normal
- no_parent_links
top source file_extract_html_links
result = file_extract_html_links(filename, count, verbose=verbose, no_parent_links=no_parent_links)
Parameters
- filename
- count
Keywords
- verbose
- no_parent_links
top source file_http_is_parent_dir
result = file_http_is_parent_dir(current, link)
Parameters
- current
- link
top source file_http_header_element
result = file_http_header_element(header, name)
Parameters
- header
- name
top source file_http_header_info
file_http_header_info, Header, hi, verbose=verbose
Parameters
- Header
- hi
Keywords
- verbose
top source file_http_copy
file_http_copy, pathnames, newpathnames, recurse_limit=recurse_limit, verbose=verbose, serverdir=serverdir, localdir=localdir, localnames=localnames, file_mode=file_mode, dir_mode=dir_mode, last_version=last_version, min_age_limit=min_age_limit, host=host, user_agent=user_agent, user_pass=user_pass, preserve_mtime=preserve_mtime, restore_mtime=restore_mtime, if_modified_since=if_modified_since, ascii_mode=ascii_mode, no_globbing=no_globbing, no_clobber=no_clobber, archive_ext=archive_ext, archive_dir=archive_dir, no_update=no_update, no_download=no_download, ignore_filesize=ignore_filesize, ignore_filedate=ignore_filedate, url_info=url_info, progobj=progobj, links=links, force_download=force_download, error=error
Parameters
- pathnames
- newpathnames
Keywords
- recurse_limit
- verbose
- serverdir
- localdir
- localnames
- file_mode
- dir_mode
- last_version
- min_age_limit
- host
- user_agent
- user_pass
- preserve_mtime
- restore_mtime
- if_modified_since
- ascii_mode
- no_globbing
- no_clobber
- archive_ext
- archive_dir
- no_update
- no_download
- ignore_filesize
- ignore_filedate
- url_info
- progobj
- links
- force_download
- error
File attributes
Modification date: | Sat Dec 6 11:10:08 2014 |
Lines: | 568 |