Category Archives: Uncategorized

Facebook App Prefetching Looks Like a DDoS Attack

Recently, Facebook announced that their mobile application will implement content pre-fetching. This means that if somebody creates an FB post about a page on your website (or you run Facebook ads with a link to your site), as FB users view their timeline and see that post, the mobile app fires off a “GET” request to the linked content on your server. The FB app caches that content for a short time, and if the user clicks on the post, the app serves up the cache before sending you on to the actual site so that the response time appears to be reduced.

This is both good and bad. It’s good because who doesn’t want their website to appear to load faster for users who are trying to reach their content? It’s potentially bad, because the higher the “post reach” in the Facebook network, the more prefetching that is going to occur on your server. And it looks a lot like a DDoS attack: spikes of traffic from all over the world, but none of it gets logged in your JS analytics solutions (since prefetching does not parse and execute the JS).

Recently, even with a small reach (60,000 users reached) we experienced a consistent 60 requests/per-minute traffic surge lasting in periods of 10 minutes at a time as a boosted post and some ads rolled out across the FB network. All on Android devices (according to logged user agents) from the FB mobile app web-view user agent, mostly in the target geographic region from the ads.

With enough cash and desire to drive user acquisition, we could essentially pay Facebook to DDoS ourselves. (Or if you’re lucky enough to have a huge page following, your post could potentially do that without boosting).

Facebook does send the “X-Purpose:preview” header to let you know what the request is about, but seeing standard “combined format” log files will at first be a bit confusing (lots of traffic, random IPs, all on Android devices, and nothing logged in your JS analytics platforms).

In NGINX try this:

log_format combinedPurpose '$remote_addr - $remote_user [$time_local]  '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" purpose:$http_x_purpose';

access_log /var/log/nginx/access.log combinedPurpose;

I’m not sure I agree with Facebook’s decision to do this. If website operators need to prepare extra capacity just handle users scrolling through an app (with a huge install base) that they don’t control, this seems like a huge waste of resources (electricity, virtual machine images spun up to respond to traffic), or operators need a very smart caching plan. All for the possibility that a user clicks a post and saves a few seconds.

Using Iodine DNS Tunneling on OS X Mavericks

For a long time I have had a T-Mobile unlimited data plan which allowed tethering my laptop to my Android Moto G LTE phone which runs Android KitKat 4.4.3. After switching plans tethering is now blocked by an “up-sell” screen when I connect to my hotspot. I don’t really mind paying for tethering, so I called up T-Mobile only to be told my particular day-by-day “unlimited” data plan doesn’t even have a tethering add-on even though I’m willing to pay a little extra for that. Well then. Even the “un-carrier” is still a carrier.

So — what to do?

Well back in the old days, T-Mobile used to block tethering just by inspecting the browser’s User Agent string and would redirect users to an up-sell tethering page if a mobile browser wasn’t detected. That might stop most users, but with User-Agent switching plugins readily available for all the major browsers, this used to be an easy work around.

After a lot of googling, nobody really seems to know definitively how T-Mobile is detecting tethering, and thus there are lots of proposed work-arounds. Many suspect that T-Mobile is inspecting packets instead of the payload of the packets to determine where those packets originated, and blocking tethered packets.

Perhaps it has something to do with how KitKat sets up separate routing tables for tethered data, thus allowing the carrier to differentiate between tethered and non-tethered data. By rooting your phone you can setup different routing tables and that might work, but I don’t want to root my phone (yet).

Perhaps the carrier is inspecting packets’ TTL values. Since packets from the tethered computer have a different TTL value than packets from the phone, the carrier could discriminate which packets to block in this way. By changing OS X’s TCP TTL value, perhaps we can slip them around the roadblock. This didn’t work for me.

Some people reported that by doing some technical machinations to set the “tether_dun_required” flag on the Android phone from 1 to 0. Perhaps this worked for a while, or perhaps on some phones, but it doesn’t work for me on the Moto G.

UPDATE: Actually, if you are on T-Mobile and tethering is blocked, there is something quite easy you can try to get it working which did work for me: Basically you need to update the T-Mobile APN (“Access Point Name”) configuration. To do so on your Android: Settings -> More… -> Mobile Networks -> Access Point Names. Then, click “T-Mobile US LTE” (it might have a different name, but the URL you see underneath it should be “fast.t-mobile.com”). Tap “APN type” and to that setting, append this:

,dun

Then save the settings. Tethering works again. My understanding of this this “dun” APN type is that it stands for “Dial Up Networking”. Essentially, when your phone requires network access, it needs to connect to the data network do any variety of things, say use the internet or send an MMS. When your phone’s tethering hotspot is enabled, it is requesting to use this “dun” APN type. Android then connects using the settings for an APN in your list that has the “dun” type. Apparently, the fast.t-mobile.com APN seems to allow tethering, whereas the other “dun” APN that existed on my phone, pcweb.t-mobile.com does not. That seems a little fragile and one day T-Mobile may get wise. But, for now that works.

So — what about this whole Iodine DNS tunneling thing, then?

Well, we still have airport hotspots to tunnel through, now don’t we?

DNS Tunneling basically means that if your computer can send and receive valid DNS responses, we can hide our network traffic inside the DNS packets. This means we need to run a server process (iodine) on a remote machine with port 53 open to receive and deal with these packed DNS packets, configure DNS entries to point to that server in a particular way, and then run a local client on your OS X machine. Once you’ve got that client running, you can basically access the remote machine, but then you have to route your computer’s traffic through it. This can be done in different ways. If you’re just browsing the web, probably the easiest way is via an SSH SOCKS proxy. Or, you could fiddle with your routing tables to send all your traffic over the tunnel.

Let’s begin.

The DNS Entries

I use Amazon’s Route 53 service which makes it easy to manipulate DNS records for any domain you control. The basic process is that you need to setup an “NS” record for a subdomain of a domain you control. It can be a little confusing:

  • Let’s say you control example.com.
  • Choose a subdomain you want to use for setup, it can be anything, say: t.example.com (t for tunnel! keep it short)
  • We’re going to run the iodined process on a server at IP Address A.B.C.D.
  • This iodined process on A.B.C.D is going to act as the canonical Nameserver for the t.example.com subdomain

This iodined server process binds to port 53, just a like a DNS server. Our iodine client process that we’ll run on our laptop is going to take our computer’s traffic and wrap it up into DNS requests for t.example.com. Any upstream DNS server from our client is going to say “Oh, hey, you should query A.B.C.D for the IP of t.example.com — send those packets over there”.

Iodined will then then take our DNS requests with the wrapped-up traffic, dump it onto the server’s network, get a response and then “answer” the DNS request with a wrapped-up response. The packets look like normal DNS traffic, so in theory they should be able to be passed around the internet “per usual”, except that these DNS packets contain extra data, namely the traffic to and from your computer.

Because a lot of captive portals (like a cell carrier that blocks tethering, or an airport hotspot) allow DNS traffic to the outside world (if not HTTP or other traffic), as long as we can reach/query our iodined DNS server and receive responses from it, we’re in business.

The trick is, we need to tell the world “Hey — if you want to know the IP address of t.example.com, look for it here, at DNS server A.B.C.D”.

So, basically, let’s call our nameserver ns.t.example.com. It points to A.B.C.D in our DNS setup as an A record:

ns.t.example.com A record => A.B.C.D

Now, we need to assign that new shiny ns.t.example.com as a nameserver for t.example.com this is a “nameserver” (NS) record:

t.example.com NS => ns.t.example.com

That’s it — anytime any client wants to know the IP address of “t.example.com”, it’s going to ask “ns.t.example.com” which runs our iodined process.

The Server

I use a small Amazon AWS EC2 instance running Ubuntu. You need to make sure that the security group assigned to the instance allows incoming traffic on port 53 (the standard port for DNS processes).

As root:

apt-get install iodine

We need to then actually run an iodined process. In doing so we need to tell iodine what subnet we are going to use for our little private tunnel. Your computer is going to create a virtual tunnel device that will use this same subnet. So — it’s very important to use a subnet that is not being used by the server OR your computer. Amazon EC2 uses portions of the private 10.0.0.0/24 subnet for internal addressing, and it uses portions of the 172.16.0.0/20 subnet for internal services like it’s own DNS system. Most home routers use 192.168.0.0/8 or sometimes 192.168.1.0/8. This worked for me:

iodined -f -c -P secret 192.168.99.1 t.example.com

replace “secret” with a passphrase that the client will also supply. We don’t want to route traffic for just anybody. (Make sure you are running “iodined” with a “d” at the end! The program “iodine” (no “d”) is for the client…)

OS X

Obviously you need the iodine program installed. The easiest way is to install homebrew and then “brew install” it:

brew install iodine

On OS X Mavericks, you are going to have to do this as well (to get the tuntap tunnel working correctly):

sudo cp -pR $(brew --prefix tuntap)/Library/Extensions/tap.kext /Library/Extensions/
sudo cp -pR $(brew --prefix tuntap)/Library/Extensions/tun.kext /Library/Extensions/
sudo chown -R root:wheel /Library/Extensions/tap.kext
sudo chown -R root:wheel /Library/Extensions/tun.kext
sudo touch /Library/Extensions/
sudo cp -pR $(brew --prefix tuntap)/tap /Library/StartupItems/
sudo chown -R root:wheel /Library/StartupItems/tap
sudo cp -pR $(brew --prefix tuntap)/tun /Library/StartupItems/
sudo chown -R root:wheel /Library/StartupItems/tun
sudo kextload -b foo.tun
sudo kextload -b foo.tap

Then, we should be able to run the iodine client on our localhost:

sudo iodine -f -P secret t.example.com

Note that “iodine” might not be in your PATH. If it’s not, you can call it directly from where homebrew installs programs:

sudo /usr/local/Cellar/iodine/0.7.0/sbin/iodine -f -P secret t.example.com

Note: your version might not be “0.7.0” — adjust as needed.

You should now be able to “ping” the remote server through the tunnel:

ping 192.168.99.1

If your local iodine process complains that you are getting too many “SERVFAIL” responses, you can start the command with a small interval, but note that the smaller the interval, the more DNS traffic you’ll be creating:

sudo iodine -f -P secret -I1 t.example.com

Routing Traffic

Now that you can ping, you can also SSH into the remote machine. If all you need is SSH, hey, you’re good to go:

ssh user@192.168.99.1

But most of us want to at least browse the web. The easiest way is setup a SOCKS proxy via SSH, then tell your browsers to use that proxy to route all HTTP traffic. Another way is fiddle with our routes to send all traffic over the tunnel.

SSH SOCKS Proxy

To setup a SOCKS Proxy over SSH:

ssh -N user@192.168.99.1 -D 1080

This binds the proxy to localhost:1080. Any HTTP requests we make to localhost:1080 will be forwarded out to the remote machine. Tell OS X browsers to use the proxy:

  • Go to Settings -> Network -> Advanced -> Proxies.
  • Select “SOCKS Proxy”.
  • Set the proxy to localhost:1080
  • Click the “OK” button
  • Click the “Apply” button on the main network settings pane

Open a browser and your traffic should be routed over the SSH proxy.

Ok, that’s awesome, but what if we want Mail, DropBox and other non-HTTP traffic to be sent over the tunnel as well? Setup some routes. Oh, and you need to setup NAT on the remote server and alter the iptable rules as well. A little more of a headache, but doable.

Routing and NAT

This script is a great way to automatically start up iodine on your laptop and it also sets up the routes (and tears them down later) for routing all traffic through the iodine tunnel:

http://git.homecomputing.fr/my-dotfiles/blob/master/NStun.sh

Once you grab that, you need to alter some of the variables. In our example with a homebrew iodine and the given subnet, change the variables at the top of the script to the following:

#### EDIT HERE ####

# Path to your iodine executable
IOD="/usr/local/Cellar/iodine/0.7.0/sbin/iodine"

# Your top domain
IOTD="t.example.com"

# You may choose to store the password in this script or enter it every time
#IOPASS="secret"

# You might need to change this if you use linux, or already have
# tunnels running.  In linux iodine uses dnsX and fbsd/osX use tunX
# X represents how many tunnel interfaces exist, starting at 0
IODEV="tun0"

# The IP your iodined server uses inside the tunnel
# The man page calls this tunnel_ip
IOIP="192.168.99.1"

#### STOP EDITING ####

make sure that script is executable:

chmod a+x NStun.sh

Then run it as root:

sudo ./NStun.sh

And then try to use your browser… and it fails. Why? Because you need make sure your remote server is setup to actually forward the packets to the outside world via NAT.

NAT on the Server

A great writeup about this process already exists, please see the section called “Configuring NAT and IP masquerading”.

And that’s it — DNS tunneling for captive portals on OS X Mavericks.

Speed / Connectivity

So this method gets us around the captive portal, but the connection is not all that fast. And, sometimes even though we’re tunneled, some portals still are not completely defeated. Often, you’ll need to restart your tunnel, or perhaps networking if you see a dropped tunnel or connection.

The phone’s speed test results over 4G LTE:

Speed Test on a 4G LTE phone

Moments later, using tunneling, the tethered/tunneled computer’s results:

Speed Test on a DNS Tunneled Computer

Mapping NYC GIS Data with Google Maps

New York City publishes lots of geographic data from a variety of city departments. A lot of it is GIS data for mapping things likes city park locations, beaches, playgrounds and bathrooms. There’s even a tree census GIS project you can download for every borough. Every street light. Zoning data. Lots of fun stuff! It’s called the NYC DataMine, the geo-data sets are here. It’s cool, but the value of the data is limited unless you’re a GIS wonk, or use GIS mapping tools, which, if you use Linux or Mac like me, you might be out of luck for the free GUI tools. Why doesn’t the city publish their data in easier to read web-formats? People could use it to throw onto Google maps, make location aware NYC applications, etc. It is possible to work with their data in this way, but it takes a little wrangling.

Let’s take a look at the city Parks GIS project.

What’s inside that zip file? It’s a set of mostly binary files that describe shapes and polygons using points and line segments that demarcate the boundaries of all the NYC parks defined in the database. The shape files are “ESRI Shapefiles“, a format created by Esri, a GIS mapping software company. According to Wikipedia, Esri has captured a lot of the GIS toolset market and apparently NYC uses their products. Along with these shape files is a DBase 3 database that contains meta-data about those shapes (like, the name of the park, what borough it’s in, it’s area, etc.). Normally, you’d open these files in a program like ArcGis, but I don’t use Windows. Besides, this is 2011. I want to look at it on the web, probably on a Google Map.

So we have a few issues. The first is that the binary ESRI Shapefile (Parks.shp) needs to be interpreted into some kind of serial format for easier handling. Libraries exist in different languages to read this file format, but I’ve found them to be a bit clunky and it’s easier just to get it into something else.

Shape files basically contain definitions of shapes identified by points in 2D space. These points are (obviously) meant to be plotted on a map. But what kind of map? How is that map projected? You remember from elementary school the basic Mercator Projection: Take a transparent globe that has a light in the middle, wrap a sheet of paper around the globe’s equator to form a cylinder, turn on the light and trace the lines being projected from the globe. (That’s why it’s called a projection, after all.) Actually, what we were all taught in elementary school is not exactly the correct physical method for creating the projection, but the point is that when you project a spherical object onto a 2D surface, it gets distorted somewhere. This is important because the points of a shapefile can be spatially referenced to any of a number of projections. Today on the web, we mostly use latitude and longitude as input to a mapping API (like Google, or Yahoo!) and let the service figure out how to flatten it out back into 2D. Points described in degrees latitude and longitude are “spherically projected” but I have found it rare indeed for GIS data to be described so simply. GIS data tends to be described in a different spatial reference, and this is where our NYC Parks data gets a little complicated.

First, we need a tool that can actually read and write shapefiles and hopefully output them into more friendly formats. This is where the Geospatial Data Abstraction Layer (GDAL) library comes in. It’s available for Debian and Ubuntu as packages, and probably most other Linux distros as well. The GDAL toolset comes with a program called ogr2ogr and that’s what we’re going to use to get the shape file into something more handy.

But, in order to effectively convert our shape file, we need to know what spatial projection the points are described in, and what we want to re-project them into. The switches for ogr2ogr we are interested in for this are -s_srs and -t_srs which identify the “source file SRS (spatial reference system)” and the SRS we want to convert/re-project into. It turns out that are a lot of ways to describe an SRS. Some are well known and organizations have labeled them in a standard way. But, often two different standards bodies or organizations use different labels for the same SRS. SRS’s are sometimes described by a formatted string of key/value pairs (sometimes called “Well Known Text” in the GIS world, or “WKT”). Some geo-spatial libraries even define their own standard for describing SRS’s (if you’ve used Proj.4 you’ll know about their way). What it comes down to is that “standards” for describing spatial references don’t really exist. Or rather, there seem to be several parallel standards. Luckily, GDAL is good at understanding them.

So what’s our input SRS? The “WKT” of that SRS is going to be found in Parks.prj (for projection?). Just cat it out:

PROJCS["NAD_1983_StatePlane_New_York_Long_Island_FIPS_3104_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",984250.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-74.0],PARAMETER["Standard_Parallel_1",40.66666666666666],PARAMETER["Standard_Parallel_2",41.03333333333333],PARAMETER["Latitude_Of_Origin",40.16666666666666],UNIT["Foot_US",0.3048006096012192]],VERTCS["NAVD_1988",VDATUM["North_American_Vertical_Datum_1988"],PARAMETER["Vertical_Shift",0.0],PARAMETER["Direction",1.0],UNIT["Foot_US",0.3048006096012192]

Nice. So, you can see that our projection is the Lambert Conformal Conic, and we have some other parameters in here as well. As far as my research goes, things like “Datum” (“D North American 1983”) and “PROJCS” (“NAD_1983_StatePlane_New_York_Long_Island_FIPS_3104_Feet”) indicate known US “state plane coordinate systems” that describe portions of the earth.

Ok, that’s the WKT for our input SRS (don’t worry, GDAL will just deal with that sucker). We need the output SRS. The coordinate system that GPS uses and pretty much all the mapping APIs expect as input is known as the World Geodetic System, last revised in 1984. Shorthand: WGS84. I’m not so sure ogr2ogr “knows” what WGS84 is by name. It’s man page, however, indicates that it does know about SRS’s described by a particular standards body, the Geomatics Committee. The Geomatics Committee calls WGS84 “EPSG:4326” and GDAL’s tool can handle that. (By the way, if you are using a Ruby or Python library that wraps proj.4, or you have a need to open and parse shapefile data with other tools that require quirky SRS definitions, the Geomatics Committee website has great translations of the SRS’s into WKT and proj.4 command line switches, which you will definitely need when you instantiate that RGeo object, or some such).

One more thing before actually doing this conversion/re-projection. GDAL doesn’t actually understand the Lambert Conformal Conic projection described in Parks.prj. There’s an updated (and as far as my testing goes, backwards compatible) revision of this projection which is defined as “Lambert_Conformal_Conic_2SP” and you must change your Parks.prj to read:

PROJCS["NAD_1983_StatePlane_New_York_Long_Island_FIPS_3104_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["False_Easting",984250.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-74.0],PARAMETER["Standard_Parallel_1",40.66666666666666],PARAMETER["Standard_Parallel_2",41.03333333333333],PARAMETER["Latitude_Of_Origin",40.16666666666666],UNIT["Foot_US",0.3048006096012192]],VERTCS["NAVD_1988",VDATUM["North_American_Vertical_Datum_1988"],PARAMETER["Vertical_Shift",0.0],PARAMETER["Direction",1.0],UNIT["Foot_US",0.3048006096012192]

OK! now… what output format do we want the shapefile in? Indeed, ogr2ogr2 can output a new ESRI shapefile, or we can do something like… output it to JSON which seems like a winning format to me (check the man page for other fun formats you can use):

ogr2ogr -f "GeoJSON" -s_srs Parks.prj -t_srs EPSG:4326 Parks.json Parks.shp

And we’re done! We have a nice JSON encoded string in Parks.json (albeit a very large one), with descriptions of all the Polygons and MultiPolygons that describe the boundaries of New York City’s parks in latitude and longitude! Easily munged to throw into a Google map or some such. Each park entry even has it’s associated meta-data.