Nov 20

Django Dropzone Uploader

Introduction

Ever been on a trip and, upon return, needed a quick and easy way for all your friends to send you their pictures and videos without burning CDs, sending massive emails, or using third-party services? Or, maybe a better question, ever wondered how to construct a basic Django application with Amazon’s web services, for instance S3?

Look no further. Below is the basic code for a drag-and-drop Django web application that allows users to upload files directly to an Amazon S3 bucket.

Deployment Setup

The code for this project can be found on GitHub.

You’ll need the following installed before cloning or forking the source code:

This project will write to an Amazon Web Services (AWS) S3 storage bucket, so it’s assumed you have an AWS account. If not, create one. S3 is a storage platform from Amazon, and EC2 allows you to spin up virtual servers, which you can use to host this project. If you’re new to AWS, Amazon will likely give you the first year of their smallest EC2 instance free.

This project also includes a deployment script, which allows you to easily deploy the project from your local computer to your server.

Here’s what you need to setup in AWS to ensure your account is ready to receive a deployment of this project:

  • Launch an EC2 instance running Ubuntu Server (or some other Debian-based operating system)
  • Save the .pem key pair file for the EC2 instance as ~/.ssh/myserver.pem
  • Create an EC2 Security Group that has port 80 opened
  • Create an S3 bucket.
  • Generate an AWS Access Key and Secret Access Key
  • (Optional) Create an elastic IP and associate it with the EC2 instace you created
  • (Optional) Create a DNS entry of your choosing to point to the elastic IP (AWS will generate their own DNS entry that you can also use, if you don’t have your own domain name)

Fork the Code

Now you’re ready to clone, configure, and deploy the code to your EC2 server.

  • Fork the repository on GitHub
  • Clone your forked repository
  • Modify the variables at the bottom of djangodropzonetos3/settings.py to customize the application
  • You must specify valid values for AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_STORAGE_BUCKET_NAME in settings.py
  • Modify the HOSTNAME variables at the top of fabfile.py to point to your EC2 instance’s DNS entry
  • Modify the REPO_URL variable at the top of fabfile.py to point to your fork of the repository

Deploy

The fabfile.py in the repository will take care of setting up the environment for you, including installing and configuring a web server. Isn’t that handy? So you’re ready to deploy by doing the following:

  • From the Command Line at the root of the cloned source, execute “pip install -r reqs.txt”
  • From the Command Line at the root of the cloned source, execute “fab deploy”

That’s it. If this deployment is successful, you should be able to navigate to the hostname for your server in a web browser, drop and save the files, and see them stored in your S3 bucket.

Now, start poking around in the code to learn the ease and awesomeness of Django and how this was accomplished! Leave your thoughts in the comments section below!

Apr 23

DD-WRT NAT Loopback Issue

Introduction

NAT loopback is what your router performs when you try to access your external IP address from within your LAN. For instance, say your router forwards port 80 to a web server on your LAN. From an outside network, you could simply visit your external IP address from a browser to access the web server. Internally, if NAT loopback is disabled or blocked, you would not be able to access this the same way.

There are any number of valid reasons why you’d want to allow NAT loopback on your network. If you’re like me, you simply want internal and external access to operate in the same way. NAT loopback is needed to accomplish this, and it is simple and safe. Don’t be fooled by the plethora of forum posts crying that NAT loopback is disabled on routers purposefully, that it opens up dangerous security holes, or that it will destroy your network and ultimately your livelihood as you know it. Like the vast majority of scare tactic-based content on the internet, it’s false. Your router will not stab you in your sleep if you allow NAT loopback … although it may emit higher levels of radiation, lace your lipstick and food with carcinogens (compliments of the government, of course), and kill Brad Pitt. Again. Coincidentally, the posts never specify why the claims might be true, lack credible sources, and are rarely found outside of back alley forums. We’re still talking about NAT loopback, right? The internet has made us so gullible …

The primary reason for the security concern is that some consumer routers appear to intentionally disable NAT loopback by default, and there is no way around this with stock firmware. However, this is not an intentional barrier, it’s just a constraint of limited stock firmware. Nothing new there. The simplest solution to this is, as usual, to flash DD-WRT to your router. Then, follow this tutorial to allow NAT loopback.

Implementation

Before proceeding, ensure NAT loopback actually doesn’t work with your version of DD-WRT. Different versions of DD-WRT implement NAT with slight variances, so it’s possible your version of DD-WRT may not actually need the special rules below.

To check if NAT loopback is working on your router, you’ll need your external IP address. If you don’t know your external IP address, just Google “what is my ip”. Now, open a Command Prompt and ping your external IP address. If the command times out, NAT loopback is not working.

In the DD-WRT Control Panel, navigate to the “Administration” tab and click on “Commands”. Add the following rules, then click “Save Firewall” to ensure the rules execute even after the router is rebooted.

insmod ipt_mark
insmod xt_mark
iptables -t mangle -A PREROUTING -i ! `get_wanface` -d `nvram get wan_ipaddr` \\
-j MARK --set-mark 0xd001
iptables -t nat -A POSTROUTING -m mark --mark 0xd001 -j MASQUERADE

Conclusion

That’s it! Now, try pinging your external IP again from the Command Line. This time you should receive packets.

DD-WRT is a always evolving. The developers have stated that they aren’t planning on fixing this issue, but if this procedure doesn’t work for you, leave a comment below and I’ll check to see if something has changed in the latest version of DD-WRT. I’ll try to always keep the tutorial updated with instructions for the latest DD-WRT build.

Also, if you previously followed my DD-WRT Guest Wireless tutorial, this fix should work for both interfaces.

Mar 22

DD-WRT Guest Wireless

Introduction

If you’ve done any amount of work with routers, you know that it doesn’t take long to start craving consistency. And more advanced functionality that the cheap home interfaces simply don’t grant you. This is the point where you usually break down and start research things like Tomato, OpenWrt, and DD-WRT, just to name a few of the more popular alternatives.

These alternate firmwares don’t just provide a consistent administrative experience across all compatible models and brands, they also turn a cheap home router into a flexible and competitive enterprise router.

My Setup

DD-WRT is my personal firmware of choice. Powerful, flexible, and stable. One thing that I demand in a router is the ability to broadcast a secondary SSID for my guest’s to be able to access wireless internet in my home without also having access to my entire network of computers and devices.

Gladly, because my router’s firmware was extremely slow and buggy, I flashed my Cisco E2500 router with “mini” DD-WRT firmware (the E2500 also supports the “big” firmware). But after reviewing getting the two wireless networks setup on my router, it was brought to my attention that there are no good tutorials for how exactly you are to do this using DD-WRT. The tutorial provided on their own website, in fact, does not work. So, I find that it falls upon me to put out my particular configuration for two mutually exclusive wireless networks from a single router, both networks having access to the WAN port (that is, internet access). There are, of course, multiple ways to do this. Feel free to leave alternative suggestions in the comments.

Create Two Wireless Networks

First, create your wireless networks by clicking clicking on “Wireless” and then “Basic Settings”. We’ll setup security in a moment. After you’ve configured your private wireless network setup, click “Add” under “Virtual Interfaces” to add the “wl0.1 SSID”. Give your guest network a separate SSID, and select “Enable” for “AP Isolation”.

Now click “Save” and “Apply Settings”.

ssid

Setup Wireless Security

Navigate over to the “Wireless Security” tab. After you’ve setup the wireless security for your private network, setup similar security for your guest SSID. I would advise against leaving your guest wireless completely open, but since you’re going to be giving out this password to your guests, it should probably be a little simpler than your private network’s key.

Now click “Save” and “Apply Settings”.

security

Create Bridge

At this point, you have two wireless networks broadcasting on two separate SSIDs. Both networks should have internet access, but you’ll also notice both networks dish out IPs in the same subnet, and both networks are clearly able to see each other. While you may like and trust your guests, that doesn’t mean you necessarily want them to have access to all your network devices. To separate the network routing, we need to create a bridge and place the guest network into a different subnet.

Click on “Setup” and then on the “Networking” tab. Under “Create Bridge” click “Add” to add a new bridge. Give the bridge a name, and modify the IP address of the bridge to be in a different subnet than your private network. For example, my private network grants IPs in the subnet 192.168.1.0/24, so my guest network in the image below is setup to grant IPs in the subnet 192.168.2.0/24.

Now click “Save” and “Apply Settings”. Though the page may refresh right away, you may need to wait about a minute before the bridge is available to use in the next few steps.

create-bridge

Assign Guest Network to Bridge

Under “Assign to Bridge” click “Add”. Select the new bridge you’ve created from the first drop-down, and pair it with the “wl0.1″ interface.

Now click “Save” and “Apply Settings”.

assign-bridge

Create DHCP Server for Guest Network

We’re almost there! We’ve created a bridge in an alternate subnet, but the alternate subnet doesn’t have a DHCP server, so our guests currently cannot access the guest SSID (unless they assign themselves a static IP). Scroll to the bottom of the “Networking” page and under “Multiple DHCP Server” click “Add”. Ensure your newly created bridge name is selected from the first drop-down menu.

Now click “Save” and “Apply Settings”. Congratulations, we now have a working, separate guest network! Unfortunately, while users can connect to the network and DHCP is running, guest users aren’t able to access the internet quite yet.

bridge-dhcp

Create Firewall Rules for Guest Network

Navigate to the “Administration” tab and click on “Commands”. We need to add three rules to our firewall settings before our private network is completely secure and our guest network has internet access. Add these three rules (one per line) to the “Commands” text field, then click “Save Firewall” to ensure the rules execute even after the router is rebooted.

iptables -t nat -I POSTROUTING -o `get_wanface` -j SNAT --to `nvram get wan_ipaddr`
iptables -I FORWARD -i br1 -m state --state NEW -j ACCEPT
iptables -I FORWARD -i br1 -o br0 -m state --state NEW -j DROP

firewall

Improve Guest Security

Pete Runyan commented with a few more ways to nail down the security of the guest network. For one, your guests likely assume that their device on the guest network is not accessible from other devices on the same network, so you’ll want to add the firewall rules below to make that true. It’s also probably unnecessary (depending on your needs) to allow users on the guest network SSH, Telnet, or GUI access to the router. Append these firewall rules to harden the security of all of your networks!

iptables -I FORWARD -i br0 -o br1 -m state --state NEW -j DROP
iptables -I INPUT -i br1 -p tcp --dport telnet -j REJECT --reject-with tcp-reset
iptables -I INPUT -i br1 -p tcp --dport ssh -j REJECT --reject-with tcp-reset
iptables -I INPUT -i br1 -p tcp --dport www -j REJECT --reject-with tcp-reset
iptables -I INPUT -i br1 -p tcp --dport https -j REJECT --reject-with tcp-reset

Conclusion

You should now have two working SSIDs: a private one for your home network, and a guest network for your visitors. Both networks should have internet access. The private network will function the same as a LAN and single wireless network did before, with the wireless network having full access to the LAN connections. The guest network, on the other hand, is separated from the private network. Additionally, each individual device on the guest network is separate from another, so guests cannot see each other.

If you’ve gotten to this point and something is not working, or your guest network does not have internet access, don’t be alarmed. DD-WRT is a always evolving, and it’s entirely possible bridge settings or firewall rules for the latest build have changed. If this tutorial does not produce the desired result, please leave a comment below. I’ll try to always keep the tutorial updated with instructions for the latest DD-WRT build.

Important!

If you are using DD-WRT and experiencing issues with NAT loopback (accessing your public IP address from within your network), I have a tutorial to help resolve that issue here.

Aug 04

Geocentral Location; Addresses to Coordinates

Recently, I needed to plot numerous addresses on a map and, ultimately, find the geocentral location of all addresses. The geocentral location is the weighted center of all the addresses, which can be useful in helping determine numerous things, including the average distance between all addresses and some other location.

The geocentral location is attained through relatively simple vector math. Let’s say, for instance, you have a set of points on a graph. Adding each point together would give you the weighted center of all the points, which can help you determine quite a bit about how that population of points interacts with you or each other.

I’ve put together a simple script below that interacts with Google Maps to do just that. Input a list of addresses in the text box below, attain the coordinates for each address, and plot each address, and the address’ geocentral location, on the map below.

 

A few things to keep in mind:

  • One address per line
  • Addresses must be properly formatted
  • Ensure no address lines are blank
  • The geocentral location is marked with a blue flag
  • In order to keep strain on my server low, the tool below only allows 150 or less addresses to be processed. The source is available on GitHub here, so you’re welcome to modify the tool for use on your own server.

 

 

 

If you try unable to use this tool, either an address is malformed or Google has changed a part of their Maps API. If you’re certain that all of your addresses are properly formatted and the tool still does not appear to work for you, send me an email so I can update check if Google has updated their Maps API.

Jul 28

A Correction for the WSJ: So, Who Did Invent the Internet?

Recently, Gordon Crovitz wrote an opinion piece for the Wall Street Journal titled Who Really Invented the Internet? Fortunately, it’s only an opinion piece, because there was little more than opinion, littered with plenty of misinformation, in the writing. You can read the article here.

Now, it’s not like I look to the WSJ for the latest technology information (or, in this case, technology history). Far from it. And generally when a here’s-the-truth-you-never-knew article starts with political propaganda, it’s pretty safe to assume that whatever comes next is going to be absurd. The article’s introduction could essentially be summarized as, “Obama said something that was true, but I’ll be darned if I can’t find a way to make it sound false!”

Even still, to those of us in the technology field, the “first computer” and “who invented the internet” discussions are highly revered and hotly debated, so when someone not in the industry starts boasting that they have a complete and final answer to these discussions, we usually just scoff. In Crovitz’s defense, he seems to be confusing “internet” with “World Wide Web” and many other terms that merely relate to networking and computers. But that’s about the extent I’d go to to defend him; he’s a conservative author trying to make something out of nothing just because a liberal said it.

 

Due to the fact that I’m more than a little OCD, I wound up relating the history of internet technology through the ages to my Grandpa, who originally sent me the Crovitz article. Much of the details below are in response to specific parts of Crovitz’s article, so, as painful as it may be, I recommend you read that article first. Alright, ready? Begin.

 

Personal Computer: The term “personal computer” was not coined until 1975 for the Altair 8800. However, it is highly disputed that Xerox created the first “personal computer”, by whatever modern definition you use. IBM created the first electronic computer in 1953 (the IBM 701), Digital Equipment Corporation created the first digital computer in 1960, and Hewlett-Packard release the first mass-produced digital computer in 1968, the HP 9100A.

Personal Workstation: This is the term the WSJ author is looking for in their article. The first personal workstation, a “workstation” being a computer that can be connected to another computer (in this case, through the Ethernet technology he referenced), was created by Xerox in 1974. However, the computers used by ARPANet were technically also workstations, just not mass produced.

Intranet (take special note of the “a”): A connection between two or more computers within the same network. The network in your house is an “intranet”.

Internet (take special note of the “e”): A connection between two or more networks. The wires that connect your house’s network to mine are the “internet”.

ARPANet: The first computer network (or “intranet”), created by the Department of Defense, which was fully implemented in 1969. I’ve never heard it associated with nuclear strikes or anything of the sort. It was created merely to replace slow and overused satellite communication between government agencies. When originally created, it did not use TCP/IP, it used NCP.

DNS: DNS stands for “Domain Name System”. It’s interesting that, for an article claiming Ethernet was more defining to the internet than TCP/IP, the article makes no mention of DNS, the third essential component to the modern internet. Though you type in “google.com” to get to Google, Google’s website actually lives at an Internet Protocl (IP) address of 173.194.34.165. This IP address is similar to your human’s street address. People cannot be expected to remember an IP addresses for their favorite websites, so DNS was invented to resolve a host name (google.com) to an IP address. This is similar to me saying “Ben and Jerry’s on Navy Pier” instead of “Ben & Jerry’s – NAVY PIER, 700 East Grand Ave., Chicago, IL 60611-3436″.

RFC: RFC stands for “Request for Comment”. The article does not mention these, but they are crucial to understand when things were adopted. They’re sort of like the Congressional bills of the technology world, but more well-defined. RFC documents are official definitions of technological protocols or interfaces. When something is adopted as a standard, an RFC fully defining it is written, and, if other people want to interface with it, they use that “law” to know how things work. The very first RFC, RFC 1, was called “Host Software” and dictated the infrastructure of ARPANet. RFC 791 was for TCP/IP 1981. RFC 894 was for Ethernet in 1984. RFC 1035 was for DNS in 1987. These dates do not necessarily correspond to when the interfaces were created, but they do indicate when the interfaces were standardized and/or adopted.

World Wide Web: The World Wide Web was formally introduced in 1989. The World Wide Web is, in very loose terms, the combination of HTTP, HTML, and database communication that transfers web content by a standardized means to a web browser.

 

Difference Between Intranet and Internet

So, what is the difference between the an “intranet” and the “internet”. First of all, the foundational structures of the “internet” are identical to the “intranet” (that being TCP/IP referenced in the article). Once there was the possibility for the intranet, the possibility for the internet also existed, but it was not realized until a bit later, which is why Xerox is trying to claim credit for that. It’s a chicken-or-the-egg argument. Naturally, each company (and the Pentagon) claim different loose definitions of all these terms so that they can claim credit for actually inventing the end result. The fact is, none and all of them invented it … which coincides with Obama’s remarks pretty well, if you ask me.

 

TCP/IP and Ethernet

First of all, it’s sad that the article references Vinton Cerf but makes no mention of Bob Kahn. They collaborated together to define TCP/IP, but Kahn rarely gets the credit he deserves. Kahn was actually the one with the idea of TCP/IP, and Cerf was in charge of the implementation and later the RFC definition.

Secondly, it should be highly suspect that much of the WSJ author’s claims come from a book written about Xerox. More significantly, after the WSJ article was published, the author of the cited book released a statement refuting the article and saying the article misrepresented the content of his book.

Naturally, Xerox will claim “full credit” for a discovery, as many other companies have done as well, but they cannot claim full credit as they utilized standards that had already been put in place (namely TCP/IP). However, their contribution to the internet’s development was equally strong. Ethernet was merely a communication standard that allowed passing data (at very high speeds) between two computers using TCP/IP. Neither technology would ever have been adopted by the private sector (and ultimately the world) without something like …

 

DNS

The Domain Name System was invented in 1983, and the internet would not exist without it, just like TCP/IP and Ethernet. I won’t go into details of why it was necessary, but it was created when issues were seen in how hosts were resolved with ARPANet. It was obvious that as ARPANet got larger, the way hosts were resolved (me asking, “Hey, what’s Mom and Dad’s address?) would become weaker and weaker (and certainly slower and slower). So they decentralized their host resolution to several Domain Name Systems rather than a centralized location at the Pentagon. This was essentially the birth of the privatized internet, as we know it, but that is not to discredit its foundations.

 

So Did Xerox Invent the Internet or Not?

Short answer? No. Xerox has never been one of the discussion points in the “who invented the internet” within knowledgeable circles.

Long answer? It’s a bit arrogant for Xerox (or any one company or government organization) to accept or take full or even majority credit for the invention of the modern day internet. It was a combined effort of multiple unrelated parties, companies, and government entities. People usually credit the Department of Defense with the creation of the internet because, well, they created the first internet. And without the funding and research for TCP/IP, the advancement toward what we have today would have been much slower (assuming it ever happened at all). Additionally, though Xerox coupled TCP/IP with their own technology to make Ethernet, they did not use Ethernet on the internet. They used it on their own intranet, or internal network, because at the time only government organizations had access to the internet. More importantly, TCP/IP and other internet protocols could exist outside of an internal network, which is where Ethernet is used. Ethernet is used to join computers to an intranet, not to join networks to the intranet. Xerox’s contribution certainly increased the speed and reliability of internal network communication, but that is an indirect contribution to the internet. It is not an essential part of the components that makeup the internet.

 

What About the Privatization of the Internet?

The reason the internet became privatized had little to do with little government/big government politics, as the WSJ implies, and everything to do with decentralization. The fundamental structure and combination of TCP/IP and network-to-network communications led to DNS, and once DNS was introduced it became obvious that the internet was going to become a worldwide tool that could not be contained or centralized by any one government or entity. Ironically, however, the U.S. government did still control the all DNS servers, and government organizations were the only ones with access to the internet.

Though Xerox enabled reliable intranet communications with Ethernet (which, by the way, was given back to the government for their use primarily), ARPANet expanded to become the internet, and DNS offered the potential to use the service around the globe, it was not commercialized. It was not until 1992 when Congress passed a bill (spearheaded by Al Gore, which is usually why people misquote him to make the joke in which he claims to invent the internet) that allowed commercial access to the internet. This began the privatization of the internet, but the government still controlled all DNS servers.

For six more years the internet was essentially still controlled by the U.S. government, but commercial entities were allowed to use it. In 1998 (not sure what event the article is referring to when it says 1995), the Clinton administration issued a mandate to form a non-profit organization called the International Corporation of Assigned Names and Numbers (ICANN). The U.S. government gave control of all DNS servers, maintenance, and documentation of internet infrastructure to ICANN. And you thought Google owned the internet. At that point, the internet became officially and completely privatized.

 

Doesn’t Britain Claim They Invented the Internet?

Actually, no. If you watched the Olympics 2012 Opening Ceremonies, Tim Berners-Lee was paraded through the stadium and loudly proclaimed as the “inventor of the World Wide Web”. And there’s the distinction. London never claimed he invented the “internet”. There is a difference. The “internet” and the “World Wide Web” are two distinct things, though they obviously operate together and are essentially synonymous to the average internet user today.

In 1989, Tim Berners-Lee had an idea for a database of hypertext links. Berners-Lee implemented what he called the World Wide Web with the collaborative help of Robert Cailliau. It didn’t take long for the two of them to realize the potential the World Wide Web could offer to the internet, so in late 1990 Berners-Lee developed the protocol necessary to transmit World Wide Web data across the internet: HyperText Transfer Protocol (HTTP) and HyperText Markup Language (HTML). Along with this, he developed the first web browser, which he called simply the WorldWideWeb. Joining HTTP, HTML, and a browser with the internet gave Berners-Lee the ability to pass much more valuable data from point to point, displaying that data in a specifically intended way to the end-user.

In regards to the WSJ article, it’s also possible that the author of the WSJ was confusing the term “internet” with “World Wide Web”. By 1994, better graphical browsers had been created, and the World Wide Web standard had pretty well been adopted, but primarily only by universities and research labs. In late 1994, Berners-Lee founded the World Wide Web Consortium (W3C), which maintains many of the standards for the World Wide Web still today. After W3C was founded, and in early 1995, the potential the World Wide Web coupled with the internet had to offer the commercial world became apparent, and the internet really started taking off.

 

Conclusion

Even still, the Department of Defense, Vinton Cerf, and Bob Kahn do deserve full credit for the creation of the first intranet/network and the initial ideas for networking protocols. The natural successor to that was Ethernet, DNS, and ultimately a privatized and distributed internet as we know it today.

Here’s a more simple example to help with the comparison. Assume for a moment that, prior to Henry Ford, nobody had ever done anything with a vehicle that moved (without assistance from an outside force) from point A to point B. Ford created the Quadricycle as his first vehicle. He then adapted that into the Model T. Is the Model T any more or less of a vehicle? It has more of the parts that we’re used to today, and it was certainly much more luxurious. But to say then that, because the Model T is more like what we have today, the Quadricycle was not a vehicle is silly. The Quadricycle was still a vehicle that moved you from point A to point B. The Model T was the natural successor to that, and cars have progressively become more and more advanced (with newly invented technology added to them) as society has advanced.

In the same way, ARPANet moved network information from point A to point B. The internet was the natural successor to an intranet, but the same ideas and fundamental technology were used for it, so it is safe to say that the government formed what has become the internet. Which, I believe, was President Obama’s point. No argument here that the internet boomed came in 1998 when it was fully privatized, but the internet also would not have been established in the first place without government research and funding.

Jul 14

The Napster Revolution

I’ve recently been reading through Steve Jobs’ biography, a phenomenal work by Walter Isaacson. A point that Isaacson keeps coming back to throughout the book is that Steve Jobs revolutionized six different industries: animated movies (through Pixar), personal computing, tablet computing, phones, digital publishing, and music.

I don’t disagree with Isaacson. Jobs did revolutionize the way that digital media (including music, movies, books, and more) is marketed and sold today. But before you can have the corner on the market, there needs to be demand. And the revolution that realized the screaming demand for easily accessible digital media around the globe started in a college dorm room during the summer of 1999.

 

The Beginning

It started with two adolescents, Shawn Fanning and Sean Parker, who shared mutual interest in hacking and programming. Though the teenagers never met at this point, they continued to chat over IRC in the years to come, bouncing various software ideas off each other.

During his Freshman year of college at Northeastern University in Boston, Fanning had an idea to simplify online music acquisition for him and his roommate. It was 1998, and the easiest way to download MP3s was through various websites. Each website had a different interface. Each a different library of music. Many broken links. All were very slow.

Fanning wrote a piece of software that fixed this. It provided a single, clean interface that searched all the major MP3 websites, providing results only for working links. It was effective. But it still wasn’t a comprehensive library.

By the end of his Freshman year, Fanning had dropped out of college and was mulling over ideas for a music sharing program that didn’t rely on limited libraries and websites that were taken down and relaunched on a weekly basis. He worked out the good and bad ideas for such a program with his internet buddy, Parker, over IRC, slowly growing more confident in his idea and its architecture.

By midsummer, 1999, Fanning sat down at his uncle’s for a sixty-hour programming spree, and it was during those sleepless hours that Napster was officially born.

 

The Architecture

His idea was simple enough. All he needed to do was combine three existing protocols into one client: computer-to-computer connectivity (which was accomplished in instant messaging clients like IRC), file sharing (which was implemented in many instant messaging clients and exhibited in operating systems like Windows), and advanced search (which was illustrated by MP3 and internet search engines).

Fanning had already implemented two of the three features in the MP3 search program that he wrote during his Freshman year of college. The third feature, computer-to-computer connectivity, was the innovation that led his first program to become Napster.

The issue with Fanning’s first program was the same issue independent MP3 websites had: it relied on the servers of third-party websites that were frequently taken down for a number of reasons. Using computer-to-computer connectivity, Napster utilized each user’s computer as a server on the Napster network. Rather than searching the server of a website, Napster searched the computer’s of user’s that were currently logged onto the network.

There still was a centralized server for Napster—which is what eventually led to the service’s downfall—that indexed MP3 files and their locations. This allowed the Napster to still provide a very rapid search functionality.

Amazingly, in its two years of operation, the centralized server for Napster never went down. Not once.

 

Sharing and Searching

Napster did not blindly search a user’s entire computer for MP3 files—Fanning was originally a hacker, but he still understood privacy. Nor was Napster able to search a client’s computer if the Napster client was not running. So how did user’s music library become part of the Napster network?

  1. The user would need to install the Napster client on their computer
  2. The user would need to share a specific folder on their computer
  3. The user would need to have the Napster client running

Assuming these three criteria were met, any MP3 files within the user’s shared folder would be indexed on the centralized Napster server and available for download by other logged on users.

Any other user using the Napster client could then browse for songs by artist, song, album, etc. The search would be indexed through the centralized Napster server, and results returned from the index would be shown to the user. When a user selected a song for download, the Napster server would return the IP address of the user’s computer that contained the desired song, connect the two user’s computers, and transfer the file.

 

The Rise …

After Fanning’s sixty-hour programming marathon, Napster was born. It was June of 1999, and Fanning and Parker released the beta of Napster to thirty of their friends. It was meant to be a small group for testing. But obviously, given the described architecture above, the more computers that user Napster, the larger the Napster library would be. Fanning and Parker’s friends saw this potential, and less than seven days later, the purposely small test group had spread the download from thirty to 15,000 users.

Its users unaware of the legal implications, Napster went viral. Less than a year from its release, Napster was the fastest growing website in history and had acquired over 25 million users. This growth rate was unprecedented and was a surprise to everyone except Fanning, Parker … and anyone that used the service. Napster was wildly popular on the internet for two blissful years. Before Napster’s user base started to decline (due to the hot legal attention it was receiving), the service peaked at 80 million registered users.

Fanning believed his idea would be popular. But he had no idea of the demand that it would generate. Prior to the release of Napster, digital media was not easily accessible to the general public. Napster opened our eyes to the convenience we could be affording. Unfortunately, the convenience Napster offered was relatively short lived. The Recording Industry Association of America (RIAA) had taken its focus off nearly every other legal dispute it had to focus its crosshairs squarely on Napster.

 

… And Fall

How much damage (if any) Napster did to the music industry will be a topic of debate that will never find a good answer. While the RIAA may point out that, at its peak, Napster shared roughly 2.79 billion MP3s per month among its users, others would tell you that a song downloaded for free does not always correlate to revenue lost. A statistician on the other side of the argument might point out that, during the year Napster was most popular, revenue for the music industry increased by $500 million. Neither of these facts provide hard evidence for either side of the case, but they make for good argumentation.

Regardless, the service Napster provided was solely free MP3 distribution, and there’s no doubt that the means by which Napster did this violated copyright law. The RIAA, along with major record labels, artists, producers, and other corporate giants, banded together to file litigation against Napster. The litigation itself wasn’t overly complicated, and the Napster company effectively dissolved in July 2001, two years after launching the Napster service, one year after injunction.

But the Napster rise, fall, and lawsuit were extremely interesting. No, the litigation itself was nothing to write home about. It was the companies sponsoring the litigation, as well as Napster, that illustrated both the irony of the situation and the need for something like Napster with a legal face. Because many of the same companies that sponsored the litigation against Napster, and even sued Napster itself, were the same companies that had (and continued to, even after injunction) funded Napster.

While the litigation departments of media companies around the world were building cases against Napster, the software departments were integrating components of Napster into their own applications. AOL, Yahoo, and Microsoft, for instance, each introduced instant messaging clients that had a Napster button on every chat window, which allowed you to quickly share a song with a friend. You may recall that AOL merged with Time Warner in late 2000, which caused Warner Music to be renamed to Warner Music Group. Point being, Warner Music Group was one of the many groups involved in litigation against Napster, but their parent company AOL Time Warner was funding the very company they were suing.

AOL was not the only house divided. German media giant Bertelsmann saw the potential in Napster, but they also saw how susceptible it was to legal disputes. So they invested $85 million into the company, asking them to develop a better, more secure distribution system. All this, even while Bertelsmann’s media division was also funding the RIAA and its lawsuit  against Napster. And finally, when the dust was still settling in early 2002, Bertelsmann offered to purchase Napster for $20 million. The offer was rejected, and Napster quickly disintegrated as its employees (and executives … and board) took their severance pay and fled at the sight of bankruptcy. For as spectacular as the formation of Napster was, the day the company finally closed its doors was downtrodden and quiet.

 

The Gnutella Network

The end of the Napster service did not end the Napster idea. And even though Bertelsmann offered Napster $85 million to develop a secure distribution system that the company never had time to develop, someone else did: two people named Justin Frankel and Tom Pepper, co-founders of Nullsoft, a small software company recently purchased by none other than AOL. Justin and Tom developed a more robust and secure peer-to-peer file sharing network, and they called it Gnutella. Mind you, this wasn’t years after the RIAA smashed Napster into the ground. They began working on their alternative to Napster in 2000, and the Gnutella network began to catch the attention of the public’s eye in early 2001, when Napster’s legal battles were ramping up.

To AOL’s credit, they did try to stop Gnutella from growing up and living a long and healthy life. The day after the Gnutella source code was publicly released on Nullsoft’s website, AOL demanded the project be shut down. But, of course, it had already been downloaded thousands of times, and it was already being redistributed on countless sites. So AOL’s move to pull it off their servers was said to be similar to closing the barn door after you let the horse out.

The Gnutella network, unlike Napster, was not a client. It was both a protocol and an idea. The downfall of the old MP3 sites was that both their index server and their libraries were centralized servers owned by the sites. The downfall of Napster was that, though their libraries were on their user’s computers, they still had a centralized server that indexed all MP3 files and the computers on which they were stored. The Gnutella network removed all centralized server and instead used each user’s computer as a server and also a relay. The relay was what acted in place of a centralized index server. And since the Gnutella protocol was open source, anyone could make a client that connected to it. And there are … many.

When you logged on to the Gnutella protocol using a Gnutella client, for instance LimeWire or Morpheus, the protocol on your computer would ping several other computers that it thought might be logged on. Each of those computers also had a list of computers they knew were logged on, so they would return that list to your Gnutella client. Once Gnutella found other logged on users, it would remember those addresses the next time you started the service—this way if one of the servers did go down for any reason (even copyright violation), it still had other alternatives. When you searched for a file on Gnutella, it would send the request down the chain of clients you were attached to, and clients attached to those clients, until it found a match.

In this way, the Gnutella network was completely distributed. There were no centralized servers, so there was nothing for copyright holders to seize when an infraction was suspected. Sure, they would seize your computer, since it was one of Gnutella’s servers. But there were millions of other servers out there just like you. And for this reason, the Gnutella network has never been (and likely never will be, as its effectively impossible) shutdown; it has only grown since its inception. Certain clients have legally been shutdown before, but since they are open source, they would simply reemerge a few days later.

 

Modern Distribution

The Gnutella network today is the most widely used peer-to-peer distribution network (aside from torrenting). Though illegality popularized these distribution systems, they are primarily used for legitimate transfers today, though obviously they do still house illegal content.

More importantly, however, the digital media revolution that Napster started, however controversial it was and is, finally forced the media sources to reevaluate demand for their products. Piracy initially caused CD sales to plummet and thus the music industry to lose money. But once key distributors like Apple, Amazon, and even eventually Napster again (purchased by Rhapsody and reintroduced legally for a fee) finally saw the demand that Napster enlightened them to, the music industry recovered (though they’d like you to believe they’re still limping along). Sure, CD sales have all but died, and some stores like Best Buy don’t even carry CDs, but digital sales have surpassed what CD sales used to be. The digital revolution also opened new possibilities. For instance, Pandora, offering you a digital streaming alternative to your radio.

But the demand for digital content didn’t stop at music. Companies like Netflix, Hulu, and Amazon Instant emerged. Products like the iPad and Kindle are hugely popular. And TV stations started streaming their content online. Even non-internet-based companies like Redbox were formed based on consumer’s desire for on demand content.

 

Now, I’m not condoning illegal activity. And downloading music or movies from LimeWire or The Pirate Bay is very bad, kids. But there is a lesson to be learned here. When the culture begins to change, and the culture realizes a new possibility that never existed to them before is now a reality, don’t resist it. The amount of money the record labels and RIAA put into legislation and litigation before they even consider changing with the culture to provide legal alternatives to services such as Napster far surpassed the money they were losing in records sales. When the culture advances, advance with it. That’s what technology is all about.

 

Unless the culture advances into a murdering machine. That should still be frowned upon.

Apr 23

Reagan.com Email is a Misguided Effort

I heard a commercial with the booming and illustrious voice of Rush Limbaugh. After I recovered from banging my head against my desk, I reflected on what was said in the commercial.

Rush pointed to the popular free email providers (Yahoo, Google, and others) to remind you that they scan your email. To remind you that they sell your email address, and other information about you, to the highest bidder. To remind you that the use of these free email addresses may increase your risk of spam mail. In contrast, purchasing an email address from Reagan.com provides you with private and secure email, and your information will never be sold.

I was intrigued.

I found that Rush was not the only conservative advertising this servic. Fox, CBS, and many others also endorsed it, though for slightly different political reasons; they primarily portrayed it as an email alternative “for conservatives”. They said that, unlike these free services, Reagan.com email would not have you unknowingly contributing to “the liberals”. These are hard-and-fast definitions, people.

Michael Reagan, founder of Reagan.com and son of, you guessed it, Ronald Reagan, has this to say about his service:

[…] every time you use your email from companies like Google, AOL, Yahoo, Hotmail, Apple and others, you are helping the liberals. These companies are, and will continue to be, huge supporters financially and with technology of those that are hurting our country.

Because apparently liberals are the only ones that are interested in using technology to advance our country. And apparently “the liberals” are the only people benefiting from these huge corporations. Obviously, they would never help “the conservatives”. Regardless, this is a relatively empty claim as its never actually substantiated.

 

Politics aside, allow me to explain to you from a technical perspective why the commercials endorsing Reagan.com and even the information on Reagan.com is largely misleading.

First, let’s address the script Rush was fed in his advertisement. It is well known and accepted that free email providers, along with many paying internet providers as well, will harvest and sell your information to advertising companies. It’s well known because these companies clearly state this in their Privacy Policies. The claim is that the Reagan email service, which costs you $40 per year, does not do this. However, if you read through the Privacy Policy for Reagan.com, it is true that Reagan.com says they will not collect your information, but they do allow their affiliates to collect your information.

We may also use one or more advertising network providers to help present advertisements or other content on this website. These advertising network providers use cookies, web beacons, or other technologies to serve you advertisements or content tailored to interests you have shown by browsing on this and other websites you have visited. Advertising network providers collect non-personally identifiable information such as your browser type, your operating system, web pages visited, time of visits, content viewed, ads viewed, and other click stream data.

The key phrases here are that their “advertising network providers” have the right to collect information about “content viewed”. I don’t know about you, but the content I primarily view while logged onto my email is … email.

The use of cookies, web beacons, or similar technologies by these advertising network providers is subject to their own privacy policies, not our privacy policy for this website or its Service.

Reagan.com uses the affiliate networkadvertising.org for their ads (why they show ads on a service they charge for is beyond me). Ironically, if you look through the list of partners of Network Advertising, four companies may quickly jump out at you: Microsoft (Hotmail), AOL, Yahoo, and Google. Just to name a few. Which means much of the same ad revenue that these companies may generate from your use of their free email services may still be generated for them through your use of Reagan.com.

This last point is key to highlighting the disconnect between the claim of the Reagan.com email service and the reality of the internet’s interconnectivity. This disconnect has also recently been highlighted with the controversial SOPA and PIPA bills passing through Congress. You have politicians proposing bills, or in this case making a buck using the influence of politics, on technical subjects in which they have little to no understanding.

If privacy is what you seek, you cannot use the internet, and you certainly cannot use email (unless it is isolated to an internal network). Even if a given email was secure and private while on the Reagan.com servers, any incoming and outgoing messages will go through a server at some point somewhere in the world that is likely owned, operated, or affiliated with one of the internet or server giants, including Google. Coincidentally, even if you had a Reagan.com email address and sent an email to yourself, the email would still go through one of these external servers before returning to you.

 

Next claim. Reagan.com is email for conservatives, right? So supposedly using Reagan.com will support a conservative agenda rather than a liberal agenda. Perhaps directly, and on the very surface, but indirectly (and about half an inch below the surface down to bedrock) no. As I said before, you can’t take something as intertwined and complex as the internet and expect to take the biggest internet giants out of it. Ironically, on the same site that Michael Reagan is falsely boasting that his service will get you away from those Big Brother liberal companies, he provides instructions for how to configure his email service to work on your mobile device. You know, the one made by Blackberry, Apple, or Motorola (owned by Google) running the Android OS (also owned by Google).

Let’s give Reagan the benefit of the doubt. Let’s assume he’s not trying to insinuate it’s Big Business we should distrust. Maybe he’s suggesting Google, Yahoo, and the like sell your information to the government, and that’s where the privacy risk comes in. This is half true … although they don’t sell it. And, again, Reagan.com won’t get you away from this. Even when using Reagan.com, as soon as the email leaves the Reagan.com servers, the United States government will have the opportunity to seize and view the email. They probably won’t, unless you’re a terrorist suspect, but they always have the right, no matter your provider, thanks to the Patriot Act. Heck, even on the Reagan.com servers the government has the right to seize it under this act.

 

There’s a phrase that somebody said once goes something like:

Is it really free if it costs you your privacy?

That’s up to you to decide, really. But if you believe internet companies are the only ones tracking personal information about your daily habits … well, let’s just say you should stop shopping at Target. Or Wal-Mart. Or Best Buy. Or really any major chain in America. Personally, I don’t think a corporation tracking your habits to better serve you with ads related to your interests is an invasion of your privacy.

The cost of Reagan’s supposedly private and secure email service is $40 per year. This service is rented from a man who has no technical expertise and is not a server administrator. His Terms of Service clearly and painfully guarantee you nothing in terms of support, up-time, warranty, or back-up. And if you’re expecting new features in the future … well, don’t hold your breath.

On the other hand, companies like Google and Yahoo have incentive to provide you with new features. They have incentive to guarantee you up-time, because every second their servers are down is ad revenue lost for them. They have dedicated support teams to ensure their servers are always running at peak health, and they have redundantly connected servers and farms, just in case.

Reagan’s servers go down? I’m sure they’ll get it back up eventually. But, you know, you’ve already paid them your $40, so they don’t lose money by the second when the service is down. And it is owned by a politician … so don’t expect a quick turnaround.

 

Important!

Disclaimer

I reject a good 90% of the comments for this post as spam. Not just be because hateful ignorance is spam, but because those comments largely miss the point of the post. Though I am clearly a liberal (which many comments try to point out to me as some sort of shocking revelation), my liberal bias does not go into a factual understanding of how the internet and email communication works. I never suggest in the post that you should not get Michael Reagan’s email service, I merely point out the fact that the money is still going to contribute to the same liberal policies as the free email services.

This post, I would think, would be something that conservatives would want to be informed of. Sure, the first few paragraphs contain a few political jokes, but the content of the actual post has little to do with actual politics. Ironically, conservatives instead comment to tell me that my facts are just opinions, and that the opinions of conservatives, politicians, and commentators are apparently more factual than a software engineer and server administrator. You’re certainly entitled to that belief. But don’t waste your time commenting on this post, because unless you actually have something constructive and relevant to share with the class, it won’t be approved.

I also realize that rejecting the political comments of raging conservatives simply validates their feelings of suppression and victimization. I’m more than happy to give you that satisfaction, so by all means, still post your comment and I’ll gladly “censor” you. You’re obviously here not to actually gain a better understanding of the world but simply to stick to your own worldview regardless of facts or expertise.

Of course, most of those comments seem to be written by people who clearly haven’t actually read this entire post, so it’s also unlikely they’ll read this disclaimer. Oh well! Liberal politicians and their policies welcomes your $40 subscription to Reagan.com!

Mar 01

Using VirtualBox to Host a VPS

Oracle’s VM VirtualBox is a virtualization program that allows you to run another operating system from within your native operating system. Though it is most commonly used to run fully functional operating systems such as Linux or OS X from within Windows 7 (or vice versa), it can also be used to host a Virtual Private Server (VPS).

This post does nothing to compare benchmarks between more efficient (and recommended) VPS environments such as VMware or Linux-VServer, and I would not recommend using VirtualBox as a VPS in a production environment. However, it is useful in many situations, and I’ll let you be the judge of when this should or should not be done. It is certainly acceptable for personal and developmental purposes. And hosting a VPS through something like VirtualBox that is extremely simply to setup and use allows you to easily experiment with configurations and operating systems, or even jump between multiple VPSs on the same computer.

This tutorial assumes you have a rudimentary knowledge of server software and operating systems. I’m going to be explaining virtualization to you, not the details of the server installation and configuration.

 

Setting Up VirtualBox

First, some definitions. When I refer to the host operating system, that is the primary operating system that your computer boots into. When I refer to the guest operating system, that is the virtualized system that is run from within VirtualBox. There will also be references to IP address and ports on the host and guest. They follow the same theme. Now that we’ve got that of the way …

You can pick up VirtualBox for free from their website here. Download and run the installer for your host operating system. Congratulations. VirtualBox is now ready to run. Unfortunately, it doesn’t have a guest operating system installed or configured yet, so it doesn’t do much for you. But before we actually install one of those, let’s create a virtual environment for it and configure some VirtualBox settings.

In VirtualBox, click New to create an environment where we install a guest operating system. I’m assuming you’re a civilized human being and installing a Linux server operating system, so select Linux, then select the version of operating system you’re using. If the exact version isn’t in the VirtualBox list, select the parent Linux distribution (for instance, for CentOS you’d select Fedora).

Ideally, you should grant at least half of your host system’s memory to the guest operating system. You should dedicate at least 8GB to the guests hard drive space. Luckily, since this is a virtual environment, you can select to dynamically allocate this space, so the virtual hard drive will only consume space on your host’s hard drive as it is needed. Finish up the wizard, and the guest environment will be created.

Now, to make that guest environment accessible to our host computer. Right-click on the newly created environment and select “Settings”. Click on “Network” in the list on the left, and click on “Adapter 2″. Enable this adapter and, from “Attached to:” select “Bridged Adapter”. This will cause the guest environment to resolve DHCP IP information directly from the host operating system, which means we can now forward some host ports directly to the guest operating system.

Go back to the “Adapter 1″ tab, make sure this adapter is “Attached to: NAT”, and click “Advanced”. Click on “Port Forwarding” and add a new TCP forward. Let’s call it “SSH”. Specify 22 for the host and guest ports. This will forward the host machines port 22 to the guest machines port 22—they don’t have to be the same, they just have to match other configurations on the host and guest side of things. It’s also worth adding an “HTTP” forward for port 80 as well as any other the forwards for ports controlling any other services you’d like accessible from the guest environment.

 

Server Operating System

If you haven’t already, now’s the time to choose what operating system you’re going to use for your guest environment. I recommend Ubuntu Server if you’re used to Ubuntu or Debian environments, and CentOS is another wildly popular one, though it’s not my cup of tea. Whatever operating system you choose, download the ISO for it’s installation and open up VirtualBox again.

Right-click on your guest environment and select “Settings”. From the list on the left select “Storage”, and point your virtual disc drive to the ISO you just downloaded. Once this is done, you can simply start the guest environment and it will boot with that disc “in the drive”, so you can install that operating system in the guest environment.

If you’re installing Ubuntu Server, selecting OpenSSH during the install process as well as LAMP and any other services you’d like available will make things much easier for you. However, as I said above, this tutorial assumes you have a rudimentary knowledge of server operating systems, so I’m not going to go into the details of installing those services. But to prove that our port forwards worked, you should at least install OpenSSH (during installation or as soon as you boot into the environment), and if you are able to SSH to your host computer on port 22 and access the guest environment, then everything worked the way it should have.

 

Launching Server When Computer is Booted

It may be useful to launch this virtual server when the computer boots. To do this, create a BAT file with the following command:

VBoxManage startvm “VM Name” –type headless

Place a shortcut to this BAT file in the Startup folder of a (or all) user accounts and you’re good to go. The server will launch and run in the background, allowing you to SSH into the server to control it from a terminal.

For maintenance purposes, you may also want to create a second BAT file for stopping the server (since it’s running in the background with no visible window). To do so, create a BAT file with the following command:

VBoxManage controlvm “VM Name” poweroff

 

Access from External IP

Login to your router and go the Port Forwarding section. Add a new port 22 forward, and forward that port to the IP address of the host. Do the same for port 80 and any other ports you added during the configuration above. Now, by typing in the external IP address of your network, you can SSH into the guest operating system through port 22, and you can utilize other services available to other ports.

There’s a lot more than can be done from here (using DNS to propagate to your external IP address, mail servers, etc.), but this tutorial has gotten you to the point where you can use tutorials for non-virtualized environments tutorials to accomplish those goals now. Good luck with your endeavors!

 

Feb 28

Secure PHP Login

When perusing the internet for discussions on PHP sessions and cookies in regards to credential validation and user logins, I’ve never been satisfied with the approaches I find. Many of the tutorials are just plain lousy or incomplete. And the others seem to imply that you should only use sessions or cookies and never mix-and-match, a confusion that would probably trip up many PHP novices. So I’ve decided to post a tutorial explaining the complete PHP login format I use for my sites and web applications. Before we start, I should let you know that you can grab all the source in this tutorial from GitHub.

How it Works

The way to create secure pages using PHP is a simple enough concept: determine the pages that can only be visited by logged in users and put a piece of code at the top of them to redirect logged out users to a login page. If a user visits the login page and is already logged in, they should be redirected to the main page.

So, how do you determine if a user has been logged in? You have PHP to see if there’s a fingerprint that pairs the server to the client’s computer. To do this, PHP provides access to two mechanisms: sessions and cookies. Once a user has logged in with a valid username and password, you fingerprint either the server (session) or the client’s computer (cookie). Once the fingerprint is in place, each secured page just needs to check to see if it exists. If it does, show the page to the user; if not, kick the user back to the login page.

It’s that simple.

Comparing Sessions and Cookies

Before you can really proceed, you need to understand the primary differences between sessions and cookies in PHP (and, well, anywhere). Let’s break them down for comparison:

Cookie

  • Stored on client’s computer
  • Slower, since they have to be sent to the server from the client’s computer
  • Limited on size and how many can be stored on the client’s computer
  • Can be used across multiple servers
  • Can have a lengthy lifespan
  • Can be viewed and modified by client and can therefore be a security risk, depending on the content
  • Not available until page reloads, since cookies will be sent to the server on page load

Session

  • Stored on server
  • Faster, since they are already on the server
  • Less bandwidth transfer since, rather than sending all data from client to server, the session only sends the session ID to be stored in a cookie on the client’s computer
  • Size of a session is dependent on the PHP memory limit set in php.ini, but my guess is that limit is significantly higher on your server than the 4k generally allotted to cookies
  • Cannot be used across multiple servers
  • Lifespan is very short; always destroyed when browser has been closed
  • Can only be accessed through the server, so much more secure than cookies
  • Available immediately in code without a page reload

From the above, you should be able to deduce that if you are working with sensitive data (passwords, credit card data, etc.), a session should be used. If you simply want to carry non-sensitive data between pages (the contents of a shopping cart), a cookie may be used.

Now that we understand the differences between sessions and cookies functionally speaking, what are they? Basically, as far as the code is concerned, they’re just arrays. The cookie array can be accessed using $_COOKIE[‘project-name’][‘val-name’], and the session array is conditionally accessible by referencing $_SESSION[‘project-name’][‘val-name’]. The session array is only accessible if you have started a session by calling session_start().

To store a value into a cookie, we use the provided function setcookie(‘project-name[val-name]’, $myData, time () + $keepAlive). Now let’s break this down: val-name will be the string used to reference this cookie as shown in the paragraph above. Whatever is in $myData is the string that will be stored in the cookie, and the cookie will stay alive until $keepAlive seconds from the current time have passed.

To store a value into a session is much easier. After a session has started, you simply execute $_SESSION[‘project-name’][‘val-name’] = $myData. The values will be accessible as shown above so long as the session exists—that is to say, so long as the browser has not been closed and session_destory() has not been called.

With this understanding of sessions and cookies now, you should be able to see that a session will be useful in allowing a user to login to a secured page, but that it will not allow a user to close the browser and return to that page still logged in. We’re just about to dive into the code that will allow for both of those things, but first let’s look at a common oversight.

The Shared Server Conundrum

This is a sneaky issue, because you likely won’t know that it exists until your security has been compromised, so I’ll let you in on the secret now.

PHP session variables are stored in /tmp by default, and this is true for any user on a server. Since the HTTP server software has access to read and write from this folder, and all users of a shared server execute from that same user, there is never a complete guarantee that your sessions are completely safe when you’re in a shared server environment. It is also possible for session collisions to occur because of this, for instance, if you and another user on a shared server are using the same session string. For this reason, it’s a good idea to regularly regenerate the session ID, and it’s also smart to use session strings that are related to the application you’re working with.

Another issue with shared server sessions in PHP is their timeout time. Though you may set a session timeout to be five hours, if another user on the shared server sets the timeout to be something else, say two hours, all of your sessions will also timeout in two hours, since PHP does not disambiguate between users within the /tmp folder.

I don’t know of a remedy for the timeout issue, though you may be able to contact your server admin to ask if there is a user-based php.ini file that could be configured to store your sessions somewhere other than /tmp. There are also ways to store your sessions in a database, which would get rid of both of these potential issues.

Regardless, neither of these issues are extreme vulnerabilities, but they should be something you’re aware of. If your application simply cannot share its sessions with other users, or your session data needs to be tightly maintained and secured, your best bet is to go with a dedicated server.

User Database

Before we can make a secured page that only certain users have access to, we need an access list of those users and their credentials, right? The way we achieve that goal is with a database. In our code example below, we’re using a MySQL database, so you’ll need to perform the following steps using MySQL:

  • Create a database named project_name
  • Create a table within project_name named Users
  • Users should have (at least) three columns: UserID int(11), Username char(25), and Password char(60)
    • The UserID column needs to be unique and auto-incrementing, starting at one (1)—the code below checks for a UserID equal to zero, which means that the user was not in the database
    • Ideally, the UserID column should be the primary index for the table
  • Users should have (at least) one row added: plain text Username, and hashed Password

Once a MySQL database setup like this, you’re ready to write the PHP code.

If you are a PHP beginner, please look into database sanitization. Anytime you are going to be accepting input from a web form and passing that input into a database (for example, in the case of accepting user credentials and logging that user into the website), you need to sanitize the inputs to prevent potential attacks on your website. In the source code below, database inputs are sanitized through the use PHP’s PDO library.

The Code

The snippets of PHP code below are robust enough to be deployed with a large-scale web application. If all you require is a simple authentication page and don’t much plan on using the session variables throughout your user’s stay, this code can easily be trimmed down to fit those needs as well. So, let’s walk through the code, shall we?

class-databasehelpers.php

If you are making a large-scale web application a database helpers class can help streamline repetitive database calls. If you are making a more simple login interface, you can move the functionality within this class to functions.php.

If your application eventually has a settings.php file, it’d make more sense to move the defined database constants out there.

<!--?<span class="hiddenSpellError" pre="" data-mce-bogus="1"-->php

define ('DB_HOST', 'localhost');
define ('DB_NAME', 'project_name');
define ('DB_USERNAME', 'sql-username');
define ('DB_PASSWORD', 'sql-password');

class DatabaseHelpers
{
   function blowfishCrypt($password, $length)
   {
      $chars = './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
      $salt = sprintf ('$2a$%02d$', $length);
      for ($i=0; $i < 22; $i++)
      {
         $salt .= $chars[rand (0,63)];
      }

      return crypt ($password, $salt);
   }

   public function getDatabaseConnection()
   {
      $dbh = new PDO('mysql:host=' . DB_HOST . ';dbname=' . DB_NAME, DB_USERNAME, DB_PASSWORD);

      $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

      return $dbh;
   }
}

?>

class-userdata.php

The UserData class should be an almost identical interface to the MySQL Users table. Almost identical. You should not have the Password field, as PHP will handle checking that value and beyond that the user’s password, hashed or not, should never need to be displayed.

This class is unused by this tutorial, but it is a template that can be used to easily retrieve information from a database table. When you’re ready to move on beyond the login page, you can easily use PDO to fill class variables from corresponding variables in a database table with a call like $stmt->setFetchMode(PDO::FETCH_CLASS, ‘UserData’), and then calling $stmt->fetch(PDO::FETCH_CLASS) to fill the class variables.

<?php

class UserData
{
   public $UserID;
   public $Username;
}

?>

class-users.php

The Users class is used to retrieve, assess, and modify data stored in the UserData class. For our purposes, we only need a checkCredentials() function to validate the given username and password against MySQL database elements.

<?php

require_once ('class-databasehelpers.php');
require_once ('class-userdata.php');

class Users
{
   public function checkCredentials($username, $password)
   {
      // A UserID of 0 from the database indicates that the username/password pair
      // could not be found in the database
      $userID = 0;
      $digest = '';

      try
      {
         $dbh = DatabaseHelpers::getDatabaseConnection();

         // Build a prepared statement that looks for a row containing the given
         // username/password pair
         $stmt = $dbh->prepare('SELECT UserID, Password FROM Users WHERE ' .
                               'Username=:username ' .
                               'LIMIT 1');

         $stmt->bindParam(':username', $username, PDO::PARAM_STR);

         $success = $stmt->execute();

         // If results were returned from executing the MySQL command, we
         // have found the user
         if ($success)
         {
            // Ensure provided password matches stored hash
            $userData = $stmt->fetch();
            $digest = $userData['Password'];
            if (crypt ($password, $digest) == $digest)
            {
               $userID = $userData['UserID'];
            }
         }

         $dbh = null;
      }
      catch (PDOException $e)
      {
         $userID = 0;
         $digest = '';
      }

      return array ($userID, $username, $digest);
   }
}

?>

pages.php

This class acts as an enum of pages on your site.

<?php

// To get around the fact that PHP won't allow you to declare
// a const with an expression, define our constants outside
// the Page class, then use these variables within the class
define ('LOGIN', 'Login');
define ('INDEX', 'Index');

class Page
{
   const LOGIN = LOGIN;
   const INDEX = INDEX;
}

?>

functions.php

Here’s where it gets fun. As you create more pages that should only be accessible to validated users, make sure you add them as an OR to the return of isSecuredPage().

The checkLoggedIn() function is our primary work house. This function checks to see if the current page requires validation. If the page requires validation and the user is not logged in, they are redirected to login.php. If a user has been logged in and visits the login page, they are redirected to the main page. If the user has been logged in, this function allows them to access secured pages. The checkLoggedIn() function is also responsible for completing both the login and logout process, and on successful login it sets the proper session and cookie variables.

Take note of how the secondDigest cookie parameter is being used. We need to store authentication information in the cookie so we can securely implement the “Remember me” functionality, but if all we store are credentials, the cookie could still be stolen and used. To prevent against this, we also store physical characteristics of the connection, in this case IP address and HTTP User Agent information. That data should be hashed as well so a hijacker can’t just spoof it when they steal the cookie. Now, if a hijacker takes our cookie to their own computer, the cookie will pass user authentication but fail the second digest, and the hijacker will be prompted to login again.

You would be wise to modify what exactly is in the second digest. If a standard were used, hashing it would pointless, even with the salt. Additional salt beyond the Blowfish cypher would be good, adding additional information, reordering the information before it’s hashed, etc. For increased security, you could also store the second digest on the server in the Users table, comparing the cookie’s value with that value (which would need to be updated after each successful login).

<?php

require_once ('class-databasehelpers.php');
require_once ('class-users.php');
require_once ('functions.php');
require_once ('pages.php');

function isSecuredPage($page)
{
   // Return true if the given page should only be accessible to validation users
   return $page == Page::INDEX;
}

function checkLoggedIn($page)
{
   $loginDiv = '';
   $action = '';
   if (isset($_POST['action']))
   {
      $action = stripslashes ($_POST['action']);
   }

   session_start ();

   // Check if we're already logged in, and check session information against cookies
   // credentials to protect against session hijacking
   if (isset ($_COOKIE['project-name']['userID']) &&
       crypt($_SERVER['REMOTE_ADDR'] . $_SERVER['HTTP_USER_AGENT'],
             $_COOKIE['project-name']['secondDigest']) ==
       $_COOKIE['project-name']['secondDigest'] &&
       (!isset ($_COOKIE['project-name']['username']) ||
        (isset ($_COOKIE['project-name']['username']) &&
         Users::checkCredentials($_COOKIE['project-name']['username'],
                                 $_COOKIE['project-name']['digest']))))
   {
      // Regenerate the ID to prevent session fixation
      session_regenerate_id ();

      // Restore the session variables, if they don't exist
      if (!isset ($_SESSION['project-name']['userID']))
      {
         $_SESSION['project-name']['userID'] = $_COOKIE['project-name']['userID'];
      }

      // Only redirect us if we're not already on a secured page and are not
      // receiving a logout request
      if (!isSecuredPage ($page) &&
          $action != 'logout')
      {
         header ('Location: ./');

         exit;
      }
   }
   else
   {
      // If we're not already the login page, redirect us to the login page
      if ($page != Page::LOGIN)
      {
         header ('Location: login.php');

         exit;
      }
   }

   // If we're not already logged in, check if we're trying to login or logout
   if ($page == Page::LOGIN && $action != '')
   {
      switch ($action)
      {
         case 'login':
         {
            $userData = Users::checkCredentials (stripslashes ($_POST['login-username']),
                                                 stripslashes ($_POST['password']));
            if ($userData[0] != 0)
            {
               $_SESSION['project-name']['userID'] = $userData[0];
               $_SESSION['project-name']['ip'] = $_SERVER['REMOTE_ADDR'];
               $_SESSION['project-name']['userAgent'] = $_SERVER['HTTP_USER_AGENT'];
               if (isset ($_POST['remember']))
               {
                  // We set a cookie if the user wants to remain logged in after the
                  // browser is closed
                  // This will leave the user logged in for 168 hours, or one week
                  setcookie('project-name[userID]', $userData[0], time () + (3600 * 168));
                  setcookie('project-name[username]',
                  $userData[1], time () + (3600 * 168));
                  setcookie('project-name[digest]', $userData[2], time () + (3600 * 168));
                  setcookie('project-name[secondDigest]',
                  DatabaseHelpers::blowfishCrypt($_SERVER['REMOTE_ADDR'] .
                                                 $_SERVER['HTTP_USER_AGENT'], 10), time () + (3600 * 168));
               }
               else
               {
                  setcookie('project-name[userID]', $userData[0], false);
                  setcookie('project-name[username]', '', false);
                  setcookie('project-name[digest]', '', false);
                  setcookie('project-name[secondDigest]',
                  DatabaseHelpers::blowfishCrypt($_SERVER['REMOTE_ADDR'] .
                                                 $_SERVER['HTTP_USER_AGENT'], 10), time () + (3600 * 168));
               }

               header ('Location: ./');

               exit;
            }
            else
            {
               $loginDiv = '
<div id="login-box" class="error">The username or password ' .</div>
<pre>
                           'you entered is incorrect.</div>';
            }
            break;
         }
         // Destroy the session if we received a logout or don't know the action received
         case 'logout':
         default:
         {
            // Destroy all session and cookie variables
            $_SESSION = array ();
            setcookie('project-name[userID]', '', time () - (3600 * 168));
            setcookie('project-name[username]', '', time () - (3600 * 168));
            setcookie('project-name[digest]', '', time () - (3600 * 168));
            setcookie('project-name[secondDigest]', '', time () - (3600 * 168));

            // Destory the session
            session_destroy ();

            $loginDiv = '
<div id="login-box" class="info">Thank you. Come again!</div>
<pre>';

            break;
         }
      }
   }

   return $loginDiv;
}

?>

login.php

This is the base for a login form on the login page. Notice that now we’re modifying front-centric PHP files, the only reference you see to heavy lifting is a simple call to our checkLoggedIn() function. The form handles POSTing to this page to log the user in and redirect them to index.php.

The $loginDiv that we receive from checkLoggedIn() allows us to display informative statuses to the user, for instance, if they try to login with the wrong password.

<?php

require_once ('functions.php');

// Check to see if we're already logged in or if we have a special status div to report
$loginDiv = checkLoggedIn (Page::LOGIN);

?>

<html>
   <body>
      <h2>Sign in</h2>
      <form name="login" method="post" action="login.php">
         <input type="hidden" name="action" value="login" />
         <label for="login-username">Username:</label><br />
         <input id="login-username" name="login-username" type="text" /><br />
         <label for="password">Password:</label><br />
         <input name="password" type="password" /><br />
         <input id="remember" name="remember" type="checkbox" />
         <label for="remember">Remember me</label><br />
         <!--?php echo $<span class="hiddenSpellError" pre="echo " data-mce-bogus="1"-->loginDiv ?>
         <input type="submit" value="Login" />
      </form>
   </body>
</html>

index.php

Last, but certainly not least, our secured pages. All the work we’ve done above to ensure a robust application allows us to make one simple call from a secured page: checkLoggedIn(). Everything we’ve done above handles the rest. Add this call to any page you want to be secured and you’re good to go!

One thing to note is the logout button, which simple POSTs a logout action to login.php.


<?php

require_once ('functions.php');

checkLoggedIn (Page::INDEX);

?>

<html>
   <body>
      <form name="logout" method="post" action="login.php">
         <input type="hidden" name="action" value="logout" />
         <input type="submit" value="Logout" />
      </form>
   </body>
</html>

The Common Exit Issue

Take special note that as soon as it has been determined that checkLoggedIn() in functions.php succeeded or failed (i.e. following a header call to redirect), exit has been called. This is crucial if your secured page makes ready use of your session or cookie variables, because it tells PHP to cease construction of the page immediately. It is a common mistake to not call exit after a header redirect, which is not necessarily insecure, but it is poor practice. If you fail to call exit immediately, the remainder of the page will still be evaluated by PHP (though the variables may not have been initialized), and error reports may occur. Not data will be displayed to the user, but you neglecting to call exit may fill up your PHP error logs.

The Payoff

You now have login page, secured content areas, cookie storage for returning users, and working sessions throughout your pages. What’s cool about this from this point forward is that you can easily apply this new knowledge of cookies and sessions outside of the credentials realm.

You now have live sessions on your pages, so you can store additional values in the $_SESSION variable to carry them between pages. You’ve seen how cookies work, so you can curse your clients with crumbles of your website for the next time they return (don’t be evil).

If you have any further questions regarding the login process, sessions, or cookies, or if you just found this tutorial useful, let me know in a comment.

Jan 18

SOPA Highlights

The Stop Online Piracy Act (SOPA) and the Protect Intellectual Property Act (PIPA) are two bills currently mingling in the United States House of Representatives. It would take too much time and effort for me to explain how ludicrous it is that we have misinformed politicians writing litigation for an internet that they apparently do not understand. Instead, I’ll assume you understand the basics of these bills, and I’ll just point out my favorite things I’ve seen relating to them as the bills have progressed. If you do not understand these bills and are confused by an internet search for them (they are extremely complicated), you can ask me about them apart from the blog, and I’d be happy to explain them to you as best I can.

 

So, these bills just stop online piracy, right? How could stopping illegal activity be bad? This is the notion being perpetuated by the media, the MPAA and RIAA, and some big copyright holders–translation, the supporters of the bill. It’s also a sentiment shared by people who generally don’t understand how the internet works. The truth is, these bills do far more in what they don’t say than in what they do say. Ultimately, they’re creating an internet blacklist controlled by the government. Translation: government censorship.

The bills were proposed by a Republican Congressman from Texas named Lamar Smith. Yes, this is the same Lamar Smith who got the Digital Millennium Copyright Act passed. It’s also the same Lamar Smith who has admitted that he does not have a full understand of the internet. Two things seem ironic here: first, we have politicians who admit that they do not fully understand “something” passing legislation against that “something”. Second, we have apparently “small government” Republicans trying to pass censorship bills.

Today, the Internet Blackout is taking place, and it’s essentially the first day the media corporations have at least nodded their head in the direction of this legislation. Why? The bills have been tossed around for months. By all appearances, it would appear that the media was trying to keep the bills hushed up so they would quickly pass. Unfortunately, the internet giants have made sure to get the word out. Only now, after people have started hearing about the bills through other sources, has the media started covering them. That makes sense, considering the MPAA, RIAA, and other media outlets have financially supported the drafting of the bills.

When SOPA was shelved last weekend, that was the first time the media really covered the story. They were very careful to use strong language like “killed” and “terminated” in reference to the SOPA bill. However, the bill was not “killed”. It was temporarily shelved, sure to come back in the near future (quietly, they’ve said they’re taking it back off the shelf in February … that’s not far from now). And there sure was a lot of attention focused on SOPA being shelved when PIPA was still alive and well, proceeding toward a vote.

There are several provisions at the beginning of the SOPA bill which state the bill intends to defend the First Amendment, protect the integrity of the internet, and promote cyber security. Interestingly, the very nature of the bill breaks down each of those things, which illustrates the lack of understanding the drafters of the bill have in regards to the internet.

This next point is particularly controversial, but SOPA and PIPA assume all forms of copyright infringement are intentional and inherently evil. Recent surveys indicate that over 20% of Americans have pirated something at some time. Over 70% of Americans 25-35. And over 90% of Americans under the age of 25. Does this mean 90% of Americans 25 and under are actively striving to steal? No, it means that the very nature of the internet is advertising and publicity. It may seem a stretch to suggest that piracy is publicity, but it’s no more a stretch than the MPAA and RIAA make when stating that every pirated download is a lost sale. More importantly, however, this illustrates that much of the internet’s piracy is not intentional theft, and therefore cannot be counted as “lost sales.”

Lamar Smith, lead supporter and Congressman who introduced the bill to the house, actually illegally hosted copyright material on his website until a few weeks ago. The background image of his website was a photograph taken by DJ Schulte, used without permission. The website went down shortly after a news article pointed this out, and the image has since been removed. However, SOPA doesn’t have any clauses for forgiveness. He hosted copyrighted content without permission. Shouldn’t he be held responsible? This is not an example of how copyright infringement is okay or should be tolerated. This is illustrating how even well-intentioned websites would be subject to blocking merely because they inadvertently used copyright infringing material.

But Alex, you say, shouldn’t copyright holders be able to force someone to take their content down if they are using it without permission? Yes. And they can. There are already laws in place for that. SOPA is not meant to do that, SOPA is meant to give the government the ability to force blocking those websites.

Of course, under SOPA Lamar’s website would not actually be taken down. SOPA strives to block foreign websites, as most copyright infringing hosts are not domestic to the United States. An example of this used time and time again is ThePirateBay.org, a Swedish website, Hollywood’s nemesis, that hosts torrents of anything and everything. Unfortunately for Congress, though the registrar and servers to ThePirateBay are foreign, the registry of the domain is hosted on a .org domain, which is actually domestic to the United States. Some have argued whether this is truly what the bill meant to say, so it may prove to be a moot point, but on the surface it certainly looks like their poster child for evil is immune from the bill.

When chief analysts and internet architects (including Vint Cerf [TCP/IP], Jim Gettys [HTTP/1.1], Leonard Kleinrock [ARPANET], and more … read: “the guys who created the internet”) approached Lamar and Congress to explain to them that their bill was fundamentally flawed, would break the internet, and would destroy the constructs of cyber security, Congressman Smith replied by saying that the opinions of the opposition “do not matter.” Which, in my opinion, is a great way to get re-elected. He also went on to say that the opposition was a “small minority” of the internet. Really? You would consider hundreds of millions of users, not to mention every internet giant and nearly every other tech corporation to be a “minority”? I guess we’ll see how big a “minority” can be after the petition results come out after today.

After the bill started receiving heated response from the internet community, the White House came out with their opinion on the matter. They expressed that they did not approve of the bills, and it was implied that President Obama would simply veto the bills if they were passed. This was when SOPA was shelved. However, when asked about the White House’s response, Lamar and other SOPA supporters said they were “glad to have the support of the White House,” and that they were now “looking forward to pushing this bill through to passing.” Sounds like denial to me.

 

Ultimately, this legislation does nothing to stop the problem they claim to be solving: piracy. It slaps a band-aid on a symptom (or at least tries to), but in doing so it sinks to the level of China’s internet censorship. The proposed laws also draw very solid lines in where the government would have to stop censoring. Copyright protection laws already exist. SOPA and PIPA merely try to take the burden of maintaining their rights off of the copyright holders and moving them onto the content providers. For small providers, this might be manageable. But for giants like Google, Facebook, or Twitter, it’s absurd to suggest that those companies should monitor what their users are doing (First Amendment violation) and remove linked content based on what another website is doing.

What I have pointed out are only surface level absurdities to the SOPA and PIPA bills proposed. I have many other opinions when it comes to matters of piracy, the figures of monetary “losses” the MPAA and RIAA claim each year that are apparently due to pirating, and internet censorship. But it would take far too many blogs to explain all of those as well. But it comes down to the fact that the verbiage of the bills is tamper not just with the content of the internet, but with the security and the infrastructure of the web as well. They may appear to simply be “protecting copyright material”, but you shouldn’t just rip up a street because the street may lead to an unrepeatable city, or to the house of a thief. Go arrest the thief. Don’t prohibit anyone from driving on a road near him. And, as Congressman Lamar Smith should probably learn, you may want to better define what a “thief” truly is.

 

If you’re interested in understanding the evils of SOPA and PIPA, check out this article on reddit—the Devil is in the details. I also strongly urge you to sign Google’s petition against SOPA and PIPA before January 24th.

Page 1 of 812345...Last »