Talk:Join (SQL)

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Databases.
??? This article has not yet received a rating on the assessment scale.
??? This article has not yet received an importance rating on the assessment scale.

New comments at the bottom of the page, please.

Contents

[edit] SemiJoins

what about semijoins?

From the article:
A semi join is an efficient join method where first the join attributed of one table are collected and reported to the second one. In was first reported in 1981. It can be improved with a Bloom-Filter (hashing).
Yeah, it could do with a lot more detail. Neilc 13:59, 16 Apr 2005 (UTC)
The problem with the current description for semi joins is that it doesn't say anything at all: How is something reported? What means "collect" in the above sentence? And how exactly would the improvement with hashing work? I think that we either expand this to something comprehensible or remove it. --Stolze 21:50, 14 April 2007 (UTC)

I removed the semi-join stuff from the article because it didn't say anything meaningful. --Stolze 11:52, 8 May 2007 (UTC)

[edit] Images for join types

It would be good if someone would add images to help people visualize how each type of join is evaluated. I'm thinking that there should be drawings with the overlapping rings, highlighting the solution set. I would be willing to create them, however, I do not have experience loading images into Wikipedia. DBBell 21:08, 14 December 2005 (UTC)

I hope you don't mix this up with set operations like UNION, EXCEPT, and INTERSECT? If not, you could draw the images and we will incorporate them. --Stolze 12:19, 22 April 2007 (UTC)

[edit] Excellent Article

i would not normally comment like this and i suppose it is off topic, however i just want to add that i found the page extremely helpful and interesting. it explains the taxonomy and relationships between different join types such that i am able to say i have a grasp of the subject after 5 minutes of reading. the article demystifies the subject matter very successfully. bravo.

I definitely agree. This article has above-average usefulness IMO. A big 'thanks' to the authors. —Preceding unsigned comment added by 24.150.125.164 (talk) 06:53, 7 May 2008 (UTC)

[edit] Left Outer Join

Just a quick question. Was it neccasary to have a "select _distinct_ *" in the code example for left outer joins?

It is not necessary. DISTINCT is a concept orthogonal to joins. Therefore, I removed this keyword.

[edit] Equi-Join Columns

I believe this might be an error. From the article:

The resulting joined table contains two columns named DepartmentID, one from table Employee and one from table Department.

If I'm not mistaken, this would be incorrect behavior under the SQL standard. I believe this is how it worked in earlier versions of MySQL, and possibly other DBMSs, but as per standard MySQL now only includes one instance of the join column.

Also, under Natural Joins, the order of the columns given may be incorrect (once again, possibly this is the actual order in some implementations). I believe the spec calls for the join columns to come first, and then the other columns of each table in turn.

I am basing this on the [MySQL docs], the part which begins:

The columns of a NATURAL join or a USING join may be different from previously. Specifically, redundant output columns no longer appear, and the order of columns for SELECT * expansion may be different from before.

Won't edit, as I don't think implementation docs are necessarily a good source for the standard, and I may be misunderstanding. DocAvid (talk) 21:03, 10 January 2008 (UTC)

You are correct, according to ISO/IEC 9075-2:2003 natural joins do suppress duplicate columns. This is from Section 7.7 (Syntax rules):
"7) If there is at least one corresponding join column, then let SLCC be a <select list> 
of <derived column>s of the form

COALESCE ( TA.C, TB.C ) AS C

for every column C that is a corresponding join column, taken in order of their
ordinal positions in RT1."

It's stating that the resulting duplicate named column in the select list is the coalesced result from the columns of the same name in both table A and table B.SqlPac (talk) 05:51, 6 February 2008 (UTC)

It should be clearer what is meant by "A" and "B" in the article, i.e. which part of the sample query is the A table and which is the B table. (better yet, call the tables A and B to make it completely clear. —Preceding unsigned comment added by 70.66.236.249 (talk) 22:26, 19 March 2008 (UTC)

[edit] Implicit outer joins

Does this article mean that SQL 2003 doesn't support implicit left/right joins using *= and =* syntax? 71.4.124.241 (talk) 18:10, 14 April 2008 (UTC)

[edit] Full outer joins by left joins

I believe the example of how to construct a full outer join using only left joins is incorrect (the first one using outer left and outer right joins seems good). There is nothing ensuring that the two tables that are being unioned haved the same column ordering. The correct fix would be to ensure that the second left join has the same column ordering as the first left join by instead of doing doing

UNION
SELECT *
FROM   department

doing

UNION
SELECT employee.*, department.*
FROM   department

You probably need a similar fix for the case of only using right outer joins. I verified that the command as shown does not work as stated in SQLite (version 3.4). Note that I am not a SQL expert though.

[edit] Foreign key error

The Sample tables section mentions the use of a foreign key:

"In the following tables, department.DepartmentID is the primary key, while employee.DepartmentID is a foreign key."

This is incorrect given the definition of a foreign key [1], as it mentions: "The values in one row of the referencing columns must occur in a single row in the referenced table. Thus, a row in the referencing table cannot contain values that don't exist in the referenced table (except potentially NULL)."

Unfortunately, this seems to be the case, as the article itself says: "On the other hand, the employee 'Jasper' has no link to any currently valid Department in the Department Table." and as such this would violate the referential integrity constraint.

The examples are excellent, but I think it shouldn't be mentioned that it is a foreign key. I don't think it matters to the example anyway, because the presence of a foreign key is not required.

[1] http://en.wikipedia.org/wiki/Foreign_key —Preceding unsigned comment added by Svenmathijssen (talk • contribs) 10:04, 19 May 2008 (UTC)

Yeah. Department.departmentID is not a real Foreign Key, so the article should be changed. Aaron Schulz 14:00, 19 May 2008 (UTC)

[edit] Computation of Cartesian product

From the article:

"The SQL-engine computes the Cartesian product of all records in the tables. That is, processing combines each record in table A with every record in table B. Only those records in the joined table that satisfy the join predicate remain."

I seriously doubt the SQL engine will explicitly compute the Cartesian product for such a query, especially in the presence of a clustered primary key on Department.DepartmentID further optimized by a sort order on Employee.DepartmentID (as the example suggests). The result can be thought of as a Cartesian product with the unnecessary tuples removed, but the engine will benefit from using the index. —Preceding unsigned comment added by Svenmathijssen (talk • contribs) 10:21, 19 May 2008 (UTC)

Yeah, it really would use the index and scan while joining with each row, otherwise it would be hella slow. Aaron Schulz 13:56, 19 May 2008 (UTC)