Some concepts are extremely easy to explain in words, yet difficult to extract from a database. For example, on a ticket-purchasing site, it’s pretty obvious that you need a way to quickly show visitors “current movies in my area.” That is certainly an easy concept, but let’s examine what it takes to get that kind of information out of our database.
“Current” means we need to be looking in the movie_showtimes table. We might define current to mean movies starting within a week.
Knowing the showtime isn’t enough. We need to get the movie info, too. A showtime without the movie name isn’t very useful. While we’re at it, we probably need to know the rating and the length of the movie, as well.
The request “in my area” means we need to know where the visitor is and where the movie theatre is. We can use the PL/pgSQL distance procedure miles_between_lat_long that we wrote in Chapter 6, but to do so we need data from the theatres table (the zip code) and from the zip_codes table (the latitude and longitude).
This represents a query with four tables: movie_showtimes, theatres, movies, and zip_codes. We could write an ActiveRecord query that could get us what we want:
MovieShowtime.find(:all, :include => [:movie, :theatre], :conditions => " movie_showtimes.start_time - now() < '1 week'::interval and movie_showtimes.start_time > now() " )
The first problem we notice, as far as ActiveRecord die-hards are concerned, is that we’ve got some SQL peaking through here. There’s no good way to say “current” in ActiveRecord parlance. The next problem is that we haven’t accounted for “in my area” yet. To do that, we need to get to the zip_codes table through the theatres table, but that’s not supported using the :include syntax. Certainly we don’t want to get all of the current showtimes and then loop through them in the application layer to find the ones that are nearby. There are likely to be far more that aren’t close than those that are, so that would be inexcusably slow.
We can get to the zip code data using the :joins syntax, but this is just another mechanism to inject raw SQL through ActiveRecord:
MovieShowtime.find(:all, :include => [:movie, :theatre], :joins => "join zip_codes on (theatres.zip_code = zip_codes.zip)", :conditions => [" movie_showtimes.start_time - now() < '1 week'::interval and movie_showtimes.start_time > now() and miles_between_lat_long( zip_codes.latitude, zip_codes.longitude, ?, ? ) < ? ", request.latitude, request.longitude, request.miles] )
It’s getting pretty ugly, and this is with an example just complex enough to prove a point, yet simple enough to include in this book. A more complex, real-life query might extend more than half a page. Plainly speaking, this is not what ActiveRecord is good at. As soon as you have custom SQL in your ActiveRecord queries, you’ve lost one of the main benefits of ActiveRecord: that queries are written to be database-independent.
Complex joins and complex conditions are never going to be easy to abstract away behind a simple to use library. In this case, with ease-of-use comes a loss of functionality. If we try to avoid SQL at all costs, using only the most well polished aspects of ActiveRecord, such as simple finds, and then take care of the joins and conditions in the application layer, we’re going to pay dearly in performance costs. The database is optimized for these purposes, and the application layer is not.
We’ve hit upon a moment where SQL really wants to shine through. Here is an opportunity to sit upon the shoulders of giants, rather than try to hide the giant.
Luckily, we can restore order in the application layer by creating a view-backed model, which is the topic of the remainder of this chapter. And what’s more, if we delegate this complexity to the database, we have further opportunities for performance enhancements that we wouldn’t have otherwise. That’s the topic of the next chapter: materialized views.
There are two ways to think of a database view. The first way is as a named subquery, ready to be referenced in other queries. The second way to look at a view is to think of it as a table that is defined by an algorithm—in the form of a SQL query—that can, with a few caveats, be treated like any other table.
Most people think of views by the second definition, but both are correct. In fact, you should be able to predict what the caveats mentioned in the table definition are by contrasting how a real table and a subquery can and cannot be used. Go ahead and think about it. The answers will be revealed in the “Considerations” section later in this chapter.
Creating a View
The syntax for creating a view is simple:
create view name as query;
For current movies, we would create the following view:
create or replace view current_movie_showtimes as select m.name, m.rating_id, m.length_minutes, ms.*, t.name as theatre_name, t.zip_code, z.latitude, z.longitude from movie_showtimes ms join movies m on (ms.movie_id = m.id) join theatres t on (ms.theatre_id = t.id) join zip_codes z on (t.zip_code = z.zip) where (ms.start_time - now()) < '1 week'::interval and ms.start_time > now();
Notice that lots of relevant data has been brought into the view from the referenced tables movies and theatres. The reason this is done is so that getting that information in the future doesn’t require us to join against those tables for a second time.
It’s also a good idea to select the primary key columns of tables, e.g., movies.id and theatres.id, so that it is still possible to do an ActiveRecord join later if necessary. In fact, the view above does implicitly select these columns; they are included in ms.*. Those columns exist in the movie_showtimes table as movie_id and theatre_id. When we see how to base an ActiveRecord model on a view in the next section, this will come in handy as it allows us to define associations just like in any other model. We’ll gain flexibility at no additional cost.
Basing a Model on a View
Basing a model on a view is actually straightforward. The syntax is the same as it would be for a normal table. For the current_movie_showtimes table, our CurrentMovieShowtime class is defined like this:
class CurrentMovieShowtime < ActiveRecord::Base belongs_to :movie belongs_to :theatre belongs_to :auditorium, :foreign_key => [:room, :theatre_id] end
We also define the inverse relationships in the related classes. For example, in the Movie class, we have associations to both the MovieShowtime class as well as the CurrentMovieShowtime class:
class Movie < ActiveRecord::Base has_many :movie_showtimes, :dependent => :destroy has_many :current_movie_showtimes end
The difference is that the relationship defined with the view cannot have a destroy dependency defined. You can only modify views by modifying the tables they depend on, so deleting from them would be meaningless (it would also cause an error).
The rest of the ActiveRecord magic still applies. You can access current showtimes directly through an association. For example:
You can also use all of the automatically defined ActiveRecord accessor on the CurrentMovieShowtime class itself as well:
Our original example of finding “current movies in my area” is now much simpler as well:
CurrentMovieShowtime.find(:all, :conditions => [' miles_between_lat_long( current_movie_showtimes.lat, current_movie_showtimes.long, ?, ? ) < ? ', lat, long, miles] )
Because we defined this query within a class-level method, we can use it when traversing associations. For example, assuming we have a ZipCode object in the variable zip, we can find the current showtimes within 10 miles of that zip code with the following statement:
m.current_movie_showtimes.find_all_within_distance( zip.latitude, zip.longitude, 10 )
This results in only a single round trip to the database to get the results. The result objects are CurrentMovieShowtime objects, which have available all the movie and theatre information as attributes, and also further support associations traversal to get at actual Movie and Theatre objects if needed.
Although views are often thought of as “just-in-time” tables, and from Rails’ perspective when creating models, we can treat views just like tables, views are not tables. We began this section stating that a view should be thought of as a named subquery. Let’s take a very simple example and imagine that we defined a view, view_of_movies, which is essentially the same as the movies table itself:
create view view_of_movies as
select * from movies;
The view name, view_of_movies, is now a name for the sub-query select * from movies;. If we wanted to do a simple select of all the records in this view, the query would look like this:
select * from view_of_movies;
But if we expand to show the subquery, we’re actually doing this:
select * from (select * from movies);
Now it should become clear that many standard table operations won’t work with views. Some operations that won’t work on views include:
- Referencing from another table
- Adding constraints
Although this seems like a long list of things you can’t do, and it is, that’s actually okay. These aren’t disadvantages of views; they’re just not what views are meant for. The word “view” itself implies that they are for looking at, not for modifying.
Let’s examine each of these constraints in turn and see what it means for our Rails application.
Insert, Update, Delete
Attempting to insert, update, or delete on a view triggers a database error. When you need to write data, you must do it against table-backed models. This means that view-backed models are not replacement for table-backed ones. They must exist in parallel, and you must use the right one for the right circumstances.
When you are exclusively reading data, especially when reading data that matches the criteria imposed by the view’s conditions, you should use the view-backed model. Some actions are only reads, such as a request that would display “current movies within my area.”
If you are creating new data, use the table-backed model. An example of a write-only request might be “buy a ticket.”
Some cases are less obvious. For example, in a request in the administrative interface, a single request may need to add a showtime, and then immediately return current movies at a theatre to show back to the administrative user. In this case, it’s clear that you would use the table-backed MovieShowtime model to add the new showtime. When you’re requesting data back out, you could use the CurrentMovieShowtime model to simplify the ActiveRecord query.
A good way to think about it is with these rules of thumb:
- If data is coming in, use a table-backed model.
- If data is going out, use a view-backed model.
These rules are shown in Figure 11-1.
References and Constraints
Views do not support references from other tables, nor do they support constraints. Using the named subquery model of views, the reasons why should be clear. But you don’t need either operation.
The purpose of a reference is to ensure that data referenced from one table exists in another. A view is based on other tables, so the values you reference in a view should exist in the original table, too. If you are exclusively using a view, you may find that when you need a new reference, you instinctively think it should be added to the view. Instead, find the appropriate table the view is based on and reference that table instead.
Constraints are also unnecessary in a view. The purpose of a table constraint is to say that a row with certain properties is not valid and should not exist in the table. In a view, you can filter out invalid rows with the conditions of the view query. If you don’t want rows with the property column_a < column_b, add that to the view’s definition:
create view my_view as select * from some_table where column_b >= column_a;
The purpose of an index is to precompute the location of rows that meet a certain criteria in a table. When you select items out of a view, any indexes that exist on the base tables will still be used if possible. However, if you are computing a column of new values and want to filter based on that column, you’re out of luck. Because views are just-in-time, data that did not exist in the base tables cannot benefit from indexes created on those tables. Each row must be computed and then selected or discarded, giving you the performance akin to a full-table scan.
Before getting disheartened, remember that without the view, the complicated ActiveRecord query has the same (or worse) performance. The original purpose of the view was to improve clarity in the application layer, not to boost performance. If you find that the query you turned into a view happens to be your application’s biggest bottleneck, there are additional options. The topic of the next chapter, materialized views, explains how you can, with some database wizardry, attain the benefits of views with the performance of fully indexed tables.
- Recast a complicated ActiveRecord query as a simpler query on a view-backed model. Now select the same information out of both, recording the time before and after. Repeat multiple times with each to eliminate the impact of disk and database caches. How does performance compare?
- Using the same complex query and view from above, compare the database plans for each. In postgres, you can get a query plan with the command explain plan query.
- Write a custom ActiveRecord extension plugin, backed_by_view, that when specified, disables functions that would attempt to write to the view.
- Locate complicated ActiveRecord queries that are repeated throughout your application. They don’t have to be identical; “similar” is a high-enough bar.
- Write a single SQL query that returns a superset of the data from the queries in the previous step.
- Create a database view based on that query. Be sure to choose a name that abides by ActiveRecord naming conventions. Create a view with:
create view viewname as query;
- Create a model based on the view. Remember to carry over any associations from the base tables, but omit destroy dependencies.
- Within associated model classes, add associations for the new view-backed model. Again, omit destroy dependencies.
- Replace the complicated ActiveRecord queries from the first step with more concise queries against the new view-backed model.
- Run your tests.
|Chapter 10 : Multiple Table Inheritance
||Chapter 12 : Materialized Views|