rsync_distro: Grab Mass Quantities of Data

Download: rsync_distro.gz

This is a simple Perl script that works with crontab to download Linux distributions (or any other file sets) from an rsync server during off hours when there is less network traffic. This version assume a permanant connection, but a few mods can be added to have it dial up a modem first (email me if you need to do this - I remember the old modem days).

This file can also easily be modified to use wget instead of rsync (email me if you need help).

The following lines from my crontab will download data for six 50 minutes between sessions between 1:00 AM and 6:00 AM:

00 1-6 * * * /usr/local/bin/rsync_distro
50 1-6 * * * killall -v rsync

When downloading large files it may be better to use 1 hour and 50 minute sessions instead.

01 1,3,5 * * * /usr/local/bin/rsync_distro
50 2,4,6 * * * killall rsync
The reason I take ten minute breaks is because I had "peer reset" trouble after about 3.5 hours of downloading. This trick took care of the problem and has worked reliably.

Once your download is complete, comment out the crontab lines so it won't indefinetely keep trying. Unless of course there are frequent changes on the server and you want to keep up to date.

If you don't want emails telling you that killall killed rsync, then delete the "-v".

Here's the listing of the perl wrapper for rsync:

#!/usr/bin/perl
#
# This is a script to download Linux distribution updates and notify you.
#
#  Script written by Brent Canipe & David Edwards <david@btdt.org>
#  Tue Sep 21 19:29:41 PDT 2004
#
#  Version 1.1: D. Edwards Sun Apr  3 17:56:42 PDT 2005
#   Added --log--format formatting

###############################################################
# Varibles need to be set for local server
###############################################################
$SERVER="Husky";
$MAIL_PROGRAM="/usr/lib/sendmail -t";
$MY_EMAIL="edwards";
$To_EMAIL="edwards";
#$Cc_EMAIL="Edwards";
$BasePath="/r/pkg";
#$RSYNC='rsync -rlptv --partial';

# See rsyncd.conf man page for other --log-format options.
#   %o for the operation, which is either "send" or "recv"
#   %f for the filename
#   %b for the number of bytes actually transferred
#   %l for the length of the file in bytes

$RSYNC='rsync -rlpt --partial --log-format="%o %f (got %b of %l bytes)"';

chdir "$BasePath";

# Change these to the server and directory tree to be downloaded
$distro_server='distro.ibiblio.org::distros';
$name='SuSE';
$version='9.2';
$URL="suse/suse/i386/$version";
&get_updates;

# Add additional sources to download here...
#$distro_server='distro.ibiblio.org::distros';
#$name='SuSE';
#$version='live-cd-9.2';
#$URL="suse/suse/i386/$version";
#&get_updates;

###############################################
## Get new and updated Linux distribution files
###############################################
sub get_updates {

    if (! -d "$name-$version") {
        mkdir "$name-$version", 0777;
    }

    $start_time=`date '+%D %X'`;
    chomp($start_time);

    $rsync_out=`$RSYNC $distro_server/$URL $name-$version`;

    $stop_time=`date '+%D %X'`;
    chomp($stop_time);

    # Process log format: %b returns a varying number of bytes more
    # than the actual size of the file.
    # Sometimes %b is just plain wrong and says it recieved a partial
    # file when the complete file was actually downloaded.

    if ($RSYNC =~ /\(got %b of %l bytes\)/) {
        #$rsync_out =~ s/got (\d+)/sprintf("got %d", $1-60)/ge

        $rsync_out =~  s/got (\d+) of (\d+) bytes/
                            if ($1 < $2) {
                                sprintf("**** %d bytes short ****", $2-$1);
                            } else {
                                sprintf("%d bytes", $2);
                            }
                        /ge;
    }

    &email;
}


###############################################
## Send out an email update notice
###############################################

sub email {


if (! -z "rsync.$name-$version.out") {

open (MAIL_RECIP, "|$MAIL_PROGRAM");
print MAIL_RECIP << "__STOP_OF_MAIL2__";
To: $To_EMAIL
From: $MY_EMAIL
Cc: $Cc_EMAIL
Subject: New $name $version Updates

Hey Dude,

Output from crontab:rsync_distro:
 $SERVER:$0

 Start time: $start_time
 Stop time:  $stop_time


We have some new $name $version updates in $BasePath/$name-$version.

   $RSYNC $distro_server/$URL $name-$version

$rsync_out

Your Humble Servant,

   $SERVER

__STOP_OF_MAIL2__

close(MAIL_RECIP);

}
}


[300M Home]

[Links for WinZip and Other Utilities]

Copyright © 1999-2012

Last modified:  Saturday, 25-Feb-2006 23:10:25 MST