Completed

Top 100 Cities in America

Published on the June 12, 2014 in IT & Programming

About this project

Open

We need to create a list with the 100 top cities for each of the following countries:  USA, Mexico, Colombia, Argentina, Chile, Peru and Uruguay.

And a second list with the regions/states in each country.

We will require as well the software used to extract the locations

The cities list shall have the following fields:
City Name, State Name, Country Name, Aliases

The states list shall have the following fields:
State Name, Country Name, Aliases


For example, let's suppose the same exercise with european countries where Spain is included:  the city of Barcelona should be included in the cities list and it should look like this:

City Name          |        State Name    |  Country Name    |  Aliases
Barcelona                    Catalonia                  Spain                      Bcn, Barna

As there are lots of Twitter locations thats says:
"Barcelona, Spain", but there are some others that just say "barna" or "bcn".

Besides, Catalonia should be included in the regions/states list and it should look like this:

State Name    |  Country Name    |  Aliases
Catalonia                  Spain                      Catalunya, Cataluña,

We will provide a dataset with hundreds of thousands of anonymous locations from all over the world extracted from Twitter Public API.

We will provide you with a text file, each line in that file contains a location as it's written in Twitter, i.e. Some of them will be useless like "in my house",  some others will be quite simple like "New York" and some others will be harder to get like "León, Gjto" which means "León, Guanajuato", so you are suppose to extract that gjto is an alias for Guanajuato (a mexican state).

As a starting point we recommend the software to run like this, it is just an idea so you can use this approach or use whatever you think will work better:

Create a list with the country names and, probably, the official region/state names and city names
Iterate over the provided locations and extract matches with the previous lists.
Analyze those matches to include new aliases.
Iterate again.

Thanks!

Category IT & Programming

Delivery term: Not specified

Skills needed