PDA

View Full Version : mySQL join question


torifile
2009-11-03, 23:30
I've got a really simple question that I'm kind of embarrassed to ask, but what the hell!

I have 2 tables - an order table (call it OrderTable) and a customer table (call it CustomerTable). I'd like to generate a list of customer names that placed particular orders. What's the easy way to spit out a table like so:

CustomerTable.FirstName | CustomerTable.LastName | CustomerTable.Email | OrderTable.ProductsOrdered

The order table has OrderID, ProductsOrdered and CustomerID.

The customer table has CustomerID FN LN Email. Obviously there'd be some sort of join on the customer id.

That's the easy part. (But I still can't figure out the statement to get it out!)

Now I've got 3 tables: a payment table (call it PaymentTable) and the aforementioned other 2 tables.

The payment table has the PaymentID, OrderID and PaymentAmount.

I'd like to have a new listing with CustomerTable.FN | CustomerTable.LN | CustomerTable.Email | OrderTable.ProductsOrdered | PaymentTable.PaymentAmount as columns.

There's a join on the order ID between PaymentTable and OrderTable and a join on customer ID between the OrderTable and CustomerTable. Right?

* this is a gross simplification of the tables I'm dealing with but I can extrapolate from functional queries. I'm just having a hard time wrapping my mind around the join statements needed to generate the output I want.

Banana
2009-11-03, 23:44
If you want only customers who actually placed an order:

SELECT c.FirstName,
c.LastName,
c.Email,
o.ProductsOrdered
FROM CustomerTable c
JOIN OrderTable o
USING(CustomerID);

With the payment, that's just one more join. Try and think of it as joining to a 'new' table that's now Customer + Order.

SELECT c.FirstName,
c.LastName,
c.Email,
o.ProductsOrdered,
p.PaymentAmount
FROM CustomerTable c
JOIN OrderTable o
USING(CustomerID)
JOIN PaymentTable p
USING(OrderID);

Note that there could be a problem with the 2nd statement. For example, suppose you had an customer who placed an order but didn't have any record in the PaymentTable (maybe didn't make a payment yet for instance.) If you need to see those customers as well, then you need to change the JOIN to a LEFT JOIN to include those customers:

SELECT c.FirstName,
c.LastName,
c.Email,
o.ProductsOrdered,
p.PaymentAmount
FROM CustomerTable c
JOIN OrderTable o
USING(CustomerID)
LEFT JOIN PaymentTable p
USING(OrderID);


Note: Air-SQL. Test & verify.

Note 2: USING() is a MySQL extension. The standard SQL usually spell the JOIN condition out:

TableA a JOIN TableB ON a.ID = b.ID

MySQL understands both and the extended version is required if the column name is not same in both table or the condition shouldn't be a equi-join. Otherwise, I'm a lazy sod who don't want to type everything out. :D

Brad
2009-11-04, 00:28
Postgresql also supports "JOIN ... USING", FWIW, and I believe the SQL:2003 standard includes the USING keyword, but I can't find a free copy of the spec to check.

Otherwise, it looks like Banana's answer should suffice. :)

chucker
2009-11-04, 00:39
You could of course be even lazier in the above example and use NATURAL JOIN syntax, assuming there aren't any other overlapping column names.

I'm more a fan of explicit syntax, though. ;)

torifile
2009-11-04, 00:41
Just so I'm clear, the statement "CustomerTable c" is telling mySQL that when something says "c." it's to look at the CustomerTable? That is, "c" becomes an alias for CustomerTable?

Banana
2009-11-04, 00:47
Brad, cool; didn't know that. Nice to know they're trying to shorten up the keywords (assuming your memory is correct). One time I looked at SQL standard and they wanted to call VARCHAR "CHARACTER VARYING" :err:


Torifile, ayup! Observe my utter laziness. ;) But it also helps makes the SQL more readable by not having to swim through the tablenames everywhere, which becomes much more important when working with several joins with high potential for column naming collisions. :)

Banana
2009-11-04, 00:52
You could of course be even lazier in the above example and use NATURAL JOIN syntax, assuming there aren't any other overlapping column names.

I'm more a fan of explicit syntax, though. ;)

It's interesting because this is probably actually closer to the relational model than other JOINs as relations by definition does not allow for duplicate columns. I've yet to use it for some reasons, though there's a convenience to be had in not having to disambiguate between c.CustomerID and o.CustomerID, I suppose. :)

Brad
2009-11-04, 00:53
Just so I'm clear, the statement "CustomerTable c" is telling mySQL that when something says "c." it's to look at the CustomerTable? That is, "c" becomes an alias for CustomerTable?

Yes, and you guessed the name right. :) It's called an "alias" and I use them in nearly all of my queries. "cr" is a lot quicker to retype than "external_product_ebook_country_restriction", which is the kind of table name I've been dealing with lately.

chucker
2009-11-04, 01:04
It's interesting because this is probably actually closer to the relational model than other JOINs as relations by definition does not allow for duplicate columns. I've yet to use it for some reasons, though there's a convenience to be had in not having to disambiguate between c.CustomerID and o.CustomerID, I suppose. :)

Sure, but have two tables Contacts(ContactID, Name, CompanyID) and Companies(CompanyID, Name), and the thing breaks down, since you certainly didn't want to join a person's name with that of an organization. Yes, this can be avoided, but often at the cost of needlessly complex column names.

Yes, and you guessed the name right. :) It's called an "alias" and I use them in nearly all of my queries. "cr" is a lot quicker to retype than "external_product_ebook_country_restriction", which is the kind of table name I've been dealing with lately.

Of note, one can also add an "AS" in there to make it clearer that it's an alias, like so:

SELECT c.FirstName,
c.LastName,
c.Email,
o.ProductsOrdered
FROM CustomerTable AS c
JOIN OrderTable AS o
USING(CustomerID);

Either syntax (with or without the "AS") also works for column names:

SELECT c.FirstName,
c.LastName,
c.Email ContactEmail,
o.ProductsOrdered
FROM CustomerTable c
JOIN OrderTable o
USING(CustomerID);

Torifile, ayup! Observe my utter laziness. ;) But it also helps makes the SQL more readable by not having to swim through the tablenames everywhere, which becomes much more important when working with several joins with high potential for column naming collisions. :)

Yeah. It also helps clarify what the table/view/column's name means in the context of this query. For instance, suppose you want to have a table Contacts and a table Companies, and for each company, you'd want your main contact. Then you could alias Contacts to MainContact — more typing, but much clearer what is going on.

Brad
2009-11-04, 01:31
Either syntax (with or without the "AS") also works for column names:
And it gets more interesting when an alias doesn't refer to a "real" column or table at all. :)

Take this contrived example that aliases columns in an interesting way...select
case
when a.type in (1,2,3) then 'customer',
when a.type = 4 then 'partner',
else 'other'
end as account_type,
extract(year from a.register_date) as year,
extract(month from a.register_date) as month,
count(*) as number_registered
from account as a
group by account_type, year, month
order by account_type, year, month



* "extract" is a timestamp-specific extension in Postgresql... I assume there's something similar in MySQL, but its purpose here should be obvious.

chucker
2009-11-04, 02:04
Yep. Ad-hoc columns like that are extremely powerful. :)

Sadly, date functions differ starkly between RDBMSs. For instance, the above would have been something more like DATEPART(YYYY, a.register_date) in the T-SQL dialect used by MS SQL Server (and by Sybase).

Banana
2009-11-04, 02:15
Ain't that the truth. Date is probably the most worst data type to work in terms of compliance and consistency across different contexts (not just SQL DBMS, but also in programming languages). :\ It's a clusterfuck, certainly.

There's also interesting stuff about how they are stored. In MS-SQL, it's a double with 0 = 1900-1-1 while MySQL, as far as I can tell, stores the dates as a string, and it's possible to enter dates like 100-1-1 though the manual does not recommend storing any dates earlier than 1000-1-1 or later than 9999-12-31 (the why is never explained... I guess ambiguity?)

It's fairly easy to predict what functions you may want to call upon strings or numeric data types but good luck figuring out what is the right function for date/time. I have to look it up everytime I need to do operations upon date data type. :\

torifile
2009-11-08, 20:32
Thanks for all the advice and philosophical discussion ;) I've got one more question:

when I list out items ordered by particular customers, I get one line per line item, if that makes sense. Say a person pays for registration and selects workshop c, they have 2 lines in the results. Is there a way to have each person listed uniquely, with multiple columns added for additional items?

Banana
2009-11-08, 20:35
So you have this:

PersonA WorkShopA
PersonA WorkShopB
PersonA WorkShopC

and you want to come out like this:

PersonA WorkShopA, WorkShopB, WorkshopC

?

If so, look at GROUP_CONCAT function. It's technically a aggregating function so you need to treat it as such:

SELECT person, GROUP_CONCAT(workshop, ", ")
FROM people
JOIN workshops
USING (personID)
GROUP BY personID;

I hope that helps get you started. My syntax for the GROUP_CONCAT function may be off, so consult MySQL manual on that one.

torifile
2009-11-08, 21:00
So you have this:

PersonA WorkShopA
PersonA WorkShopB
PersonA WorkShopC

and you want to come out like this:

PersonA WorkShopA, WorkShopB, WorkshopC

?

If so, look at GROUP_CONCAT function. It's technically a aggregating function so you need to treat it as such:

SELECT person, GROUP_CONCAT(workshop, ", ")
FROM people
JOIN workshops
USING (personID)
GROUP BY personID;

I hope that helps get you started. My syntax for the GROUP_CONCAT function may be off, so consult MySQL manual on that one.
This is very helpful. Thanks. :) Now, I'm running into a problem with BLOBs being returned. Really, at most, I'd have like 2 or 3 products purchased per user. I'm looking to get the product id, so it's numerical. Thoughts?

Banana
2009-11-08, 21:27
Hmm. I'm not sure I follow how BLOBs is being returned. A bit more context may help.

But for more general problem of wanting only say top 3 items displayed, you need a subquery for that. Here's my best guess, off-the-top-of-head, untested SQL:

SELECT person, GROUP_CONCAT(workshop)
FROM people
JOIN (
SELECT personID, workshop
FROM workshops
LIMIT 3) t
USING (personID);

The idea is that you use a derived table that generates 3 workshop per person, join to the people table then GROUP_CONCAT upon that instead of the full table.

Makes sense?

torifile
2009-11-08, 22:15
I'm not sure what was going on with the BLOBs either so I switched the column reported from the product ID to the actual name and that gave me the title of the workshop, etc, I needed. Thanks. :)

You have no idea how much easier you made my life by giving me those queries. What only took 1 hour to clean up would have taken hours to compile by hand and would have no doubt been extremely error prone. Thanks A LOT!