Posts Tagged ‘S3’

4-years on Amazon Cloud!

Tuesday, September 28th, 2010

I was introduced to Amazon EC2 by a friend who gave me early access to Amazon cloud infrastructure before it was launched publicly. Then Amazon announced a limited public beta on Aug 25 2006 (We used to read DDJ, then) and I got my personal account and have been hooked since then. While doing Tejit, I ran a crawler farm with an early implementation of Map-Reduce along with an NLP engine on EC2. At it’s peak, I had around a dozen instances wired via the Simple Queuing Service for job propagation. I discovered SQS by chance, while struggling with a Java-RMI based implementation for crawler job assignments.

If I remember it correctly, there was only one instance during launch which was m1.small:

the equivalent of a 1.7 GHz Xeon processor, 1.75 GB of RAM, 160 GB of local disk and 250 Mb/second of network bandwidth. You pay just 10 cents per clock hour

During it’s peak and several months before and after I have paid a lot of money to Amazon’s Cloud infrastructure specially to EC2 and sucked in a lot of bandwidth. Happy that today, I complete 4 years as a paid-user of Amazon! Here’s a snapshot of my Access Key which was created on Sep 27, 2006! Viva Amazon.

4 years as a paid user at Amazon Cloud

Using JetS3t to upload larger number of files to S3

Sunday, February 3rd, 2008

I was looking for a tool to upload large number of files to S3. While I have been a great fan of the bash tools for browsing and accessing s3 objects and buckets and a managing a limited number of files — I could not find an easy way of uploading a large number of files (the first batch being around 800K).

Then I downloaded JetS3t. It has a nice gui called Cockpit for managing the files on S3. The GUI is pretty neat. However, for simple upload/download S3 organizer, a simple Firefox plugin does the job. If you need to extensively manage your files then JetS3t’s cockpit is the way-to-go.

For uploading a large number of files, I was looking for something which is multi-threaded and configurable. JetS3t S3 suite has a “synchronize” application which is meant to synchronize files between a local PC and S3. JetS3t allows you to configure the number of threads and connections to the S3 service. Without reinventing the wheel, I got what I wanted. However, one additional thing I needed was the ability to delete the local files once the upload was complete. On tinkering with the java src, I modded the Synchronize.java and added the following code fragments:

public void uploadLocalDirectoryToS3(FileComparerResults disrepancyResults, Map filesMap,Map s3ObjectsMap, S3Bucket bucket, String rootObjectPath, String aclString) throws Exception  {
...
List filesToDelete = new ArrayList();
...
if (file.isDirectory() != true){
  filesToDelete.add(file.getPath());
}
...

// delete files once objects are S3d
for (Iterator ite = filesToDelete.iterator(); ite.hasNext();){
 String fName = (String)ite.next();
 File f = new File(fName);
f.delete();
}
}