Amazon Web Services
From setiquest wiki
Amazon Web Services (AWS) is a cloud platform comprising services for computation (EC2), storage (S3), and data transfer. Many setiQuest projects and web servers are hosted on AWS.
For Jill Tarter's 2009 TED wish Amazon donated AWS services of 45K hours of EC2 processing, 40 TB of data storage, and 1 TB network data transfer per month. The initial donation offer was for 5 years. In late 2010 it was announced that this offer had been extended by an additional year. The value 45K hours per month works out to 60 continuous machine instances running 24x7 which is a serious amount of processing power.
Amazon Machine Image
The Amazon Machine Image (AMI) is a 32-bit or a 64-bit Linux distribution that runs on the virtual EC2 instances. The stock AMI is based on CentOS 5 which is a fork of the Red Hat Enterprise Linux (RHEL) distribution. The size of the stock AMI has been minimized so most of the standard Linux packages are missing but they can easily be installed with the "yum" utility. The compiler in the stock AMI is gcc 4.1.2 but version 4.4 is available in the yum package repository. Other community machine images are available on AWS such as Fedora, openSUSE, and Ubuntu but they are not as well supported.
On November 1st 2010 Amazon began offering an EC2 Micro Instance free tier to new customers for one year. This offer consists of 750 EC2 Micro Instance hours per month which is enough to run it constantly. The offer also included 10 GB of Elastic Block Storage (EBS), 5 GB of S3 storage, 30 GB network bandwidth (15 GB in and out), and numerous other cloud services. The Micro Instance can run in either 32-bit or 64-bit CPU modes.
The free Micro Instance defaults to using the stock Linux AMI. An openSUSE machine image is available on AWS but its use is prohibited for the free Micro Instance offer due to licensing costs.
The Micro Instance can burst up to 2 EC2 compute units and has 613 MB of RAM. This works out to brief access to a full core of a 2.66 GHz Xeon processor but heavy CPU usage gets throttled because of the time-sharing nature of the service. Unfortunately this means that the Micro instance is not suitable for processor intensive applications.
The Micro Instance uses EBS and does not have any swap space. This means that the total RAM usage by the kernel and all processes must be less than 613 MB. It is currently not possible to run SonATA on a AWS Micro Instance.
High-performance computing (HPC) used to be the realm of dedicated supercomputers. Today HPC has moved to the cloud. AWS has several EC2 instances that fall into the HPC category:
- m2.4xlarge "High-Memory Quadruple Extra Large Instance" - 68.4 GB of memory, 26 EC2 Compute Units (2 x Intel Xeon 2.67 GHz X5550 quad-core architecture = 8 virtual cores). $1.80/hour
- c1.xlarge "High-CPU Extra Large Instance" - 7 GB of memory, 20 EC2 Compute Units (2 x Intel Xeon 2.13 GHz E5506 quad-core architecture = 8 virtual cores). $0.66/hour
- cc1.4xlarge "Cluster Compute Quadruple Extra Large Instance" - 23 GB of memory, 33.5 EC2 Compute Units (2 x Intel Xeon 2.93 GHz X5570 quad-core “Nehalem” architecture = 8 cores). $1.30/hour
- cc2.8xlarge "Cluster Compute Eight Extra Large Instance" - 60.5 GB of memory, 88 EC2 Compute Units (2 x Intel Xeon 8-core architecture = 16 cores). $2.40/hour
- cg1.4xlarge "Cluster GPU Quadruple Extra Large Instance" - 22 GB of memory, 33.5 EC2 Compute Units (2 x Intel Xeon 2.93 GHz X5570 quad-core “Nehalem” architecture = 8 cores), 2 x NVIDIA Tesla “Fermi” M2050 GPUs. $2.10/hour
All EC2 instances are 64-bit and have 1690 GB of local disk storage except for the cc2.8xlarge instance which has 3370 GB of local disk storage. An EC2 compute unit is equivalent to a 1.0-1.2 GHz 2007 AMD Opteron or 2007 Intel Xeon processor. Costs are US-East prices circa March 2012.
The three cluster compute instances use 10 Gigabit Ethernet for a very high amount of network I/O performance. This server architecture suggests that a cloud SonATA is feasible with ample network bandwidth out of the Hat Creek server room. It is unknown if the AWS 10 GbE switch supports UDP multicast subscribing. The cluster instances also utilize Intel's Hyper-threading for a doubling of the virtual core count. The three compute cluster instances are currently only available in the US East (Virginia) region.
Note that EC2 instances are not configured with any virtual swap space by default.
Transferring network data between AWS servers in a geographical region is free if the two servers happen to be in the same availability zone. Unfortunately determining a server's exact availability zone location is impossible because the zones are mapped and distributed randomly between users. Experimentation is the only way to determine if two servers are in the same availability zone.
On July 14th 2011 AWS announced lower prices for outbound bandwidth and that all inbound bandwidth is now free. Data transfer between different availability zones in the same region is still charged $0.01 per GB. So the availability zone problem mentioned above can be circumvented by being in two different regions. Example: a user's server in US West can stream unlimited data for free from setiQuest's servers that are located in the US East region, though this outbound transfer does count against Amazon's donated 1 TB per month quota.
The AWS Direct Connect service is a fast data pipe into the AWS network. Both 1 Gbps and 10 Gbps ports are supported. The rumored dark fiber out of Hat Creek could be used to create a dedicated network from the ATA directly into the internal AWS network at an Equinix access point in Silicon Valley. With enough bandwidth multiple beamformer beams could be sent to the cloud for processing. The cost to light the dark fiber is unknown. Also, Direct Connect is not an included service in the AWS donation. The data input costs are free which would match the expected usage perfectly. The Direct Connect pricing circa November 2011 is $0.30/hour for a 1 Gbps port and $2.25/hour for a 10 Gbps port.
The following setiQuest projects are running on AWS:
- Data conduit (work in progress?)
- IRC logbot (#setiquest IRC log archiving)
- setiCloud (obsolete)
- setiQuest Data (archive and web server)
- setiQuest Explorer (image data server)
- SETI Live (file receiving and rendering)
- setiQuest info service (ATA status) 
- Cloud reservation (obsolete)
- ↑ http://www.tedprize.org/jill-tarter/
- ↑ http://setiquest.org/forum/topic/community-meeting-2010-11-23#comment-1526
- ↑ http://ec2-downloads.s3.amazonaws.com/AmazonLinuxAMIUserGuide.pdf
- ↑ http://aws.amazon.com/free/
- ↑ http://setiquest.org/forum/topic/free-cloud-year
- ↑ http://aws.amazon.com/ec2/instance-types/
- ↑ http://aws.amazon.com/ec2/pricing/
- ↑ http://setiquest.org/forum/topic/community-meeting-2010-11-30#comment-1554
- ↑ http://aws.amazon.com/pricing_effective_july_2011/
- ↑ http://setiquest.org/forum/topic/some-brief-answers-many-questions
- ↑ http://aws.amazon.com/directconnect/#pricing
- ↑ http://setiquest.info