## Solution to Question 6.

Guidance: We can create a new column to specify whether a user is a re-patronage traveler or not, and then calculate the proportion of re-patronage travelers by country.

ALTER TABLE repatronage_user_assignment ADD COLUMN revisit TINYINT(1) NULL DEFAULT NULL;

UPDATE repatronage_user_assignment SET revisit = 0;

UPDATE repatronage_user_assignment SET revisit = 1 WHERE user_id IN

(SELECT user_id FROM (SELECT user_id, hotel_id, COUNT(*) fre FROM repatronage_review_assignment GROUP BY user_id, hotel_id HAVING fre > 1) as a);

After labelling the re-patronage travellers, we can calculate their respective proportions.

SELECT a.country, a.fre, b.fre2, a.fre/b.fre2 FROM (SELECT country, COUNT(*) fre FROM repatronage_user_assignment where revisit = 1 GROUP BY country) a

join (SELECT country, COUNT(*) fre2 FROM repatronage_user_assignment GROUP BY country) b ON a.country = b.country

We can also use one query to get the result, such as:

SELECT a.country, a.fre, b.fre2, a.fre/b.fre2 FROM

(SELECT country, COUNT(*) fre FROM repatronage_user_assignment

where user_id IN

(SELECT user_id FROM (SELECT user_id, hotel_id, COUNT(*) fre FROM repatronage_review_assignment GROUP BY user_id, hotel_id HAVING fre > 1) as c)

GROUP BY country) a

JOIN

(SELECT country, COUNT(*) fre2 FROM repatronage_user_assignment GROUP BY country) b ON a.country = b.country

Guidance: The key to answer this question is to i) first label which reviews belong to re-patronage travelers, and then ii) label the first-visit reviews of the identified re-patronage travelers based on the condition of minimal “review_date”. The second visits should also have minimal “review_date” after excluding the first-visit reviews.

Suggestion for solving similar problems in the future: if you want to zoom-in or study a part of data, label them or save them to another temporary table!

Please note that a traveler may visit both hotels (e.g. only once for each hotel), and this will cause problem if not taken into account.

Similar to the question 6.1, we first identify the reviews belonging to re-patronage travelers.

ALTER TABLE repatronage_review_assignment ADD COLUMN revisit TINYINT(1) NULL DEFAULT NULL;

UPDATE repatronage_review_assignment SET revisit = 0;

UPDATE repatronage_review_assignment SET revisit = 1 WHERE (user_id, hotel_id) IN

(SELECT user_id, hotel_id FROM

(SELECT user_id, hotel_id, COUNT(*) fre FROM repatronage_review_assignment GROUP BY user_id, hotel_id HAVING fre > 1) as a);

After that, we can specify the first visits. The review of a re-patronage traveler with minimal ‘review_date’ can be identified as the first visit.

ALTER TABLE repatronage_review_assignment ADD COLUMN orders char(10) NULL DEFAULT NULL AFTER revisit;

UPDATE repatronage_review_assignment SET orders = '1st' WHERE (user_id,hotel_id, review_date) IN (SELECT user_id,hotel_id, review_date FROM

(SELECT user_id,hotel_id, min(review_date) AS review_date  FROM repatronage_review_assignment where revisit = 1 GROUP BY user_id,hotel_id) a);

After labelling thre first visits, we can specify the second visits. The review of a re-patronage traveler with minimal ‘review_date’ can be identified as the first visit after excluding the first visits.

UPDATE repatronage_review_assignment SET orders = '2nd' WHERE (user_id,hotel_id, review_date) IN (SELECT user_id,hotel_id, review_date FROM

(SELECT user_id,hotel_id, min(review_date) AS review_date  FROM repatronage_review_assignment where revisit = 1 AND orders IS null  GROUP BY user_id,hotel_id) a);

Now, we can compare their average ratings directly.

SELECT orders, AVG(overall_rating),AVG(rooms_rating), AVG(service_rating),

AVG(location_rating), AVG(value_rating) FROM repatronage_review_assignment where orders is not null GROUP BY orders;

We could also get the average values of ratings pertinent to first patronage through one command, which is not recommended due to having a very complicated logic:

SELECT AVG(overall_rating),AVG(rooms_rating), AVG(service_rating),

AVG(location_rating), AVG(value_rating) FROM repatronage_review_assignment

WHERE

(user_id,hotel_id, review_date) IN

(SELECT user_id,hotel_id, review_date FROM

(SELECT user_id,hotel_id, min(review_date) AS review_date  FROM repatronage_review_assignment WHERE

user_id IN

(SELECT user_id FROM (SELECT user_id, hotel_id, COUNT(*) fre FROM repatronage_review_assignment GROUP BY user_id, hotel_id HAVING fre > 1) c)

GROUP BY user_id,hotel_id) a)

We could also get the average values of ratings pertinent to second patronage through one command, which is not recommended due to having a very complicated logic:

SELECT AVG(overall_rating),AVG(rooms_rating), AVG(service_rating),

AVG(location_rating), AVG(value_rating) FROM repatronage_review_assignment

WHERE (user_id,hotel_id, review_date) IN

(SELECT user_id,hotel_id, min(review_date) AS review_date FROM repatronage_review_assignment

WHERE

(user_id,hotel_id, review_date) NOT IN

(SELECT user_id,hotel_id, review_date FROM

(SELECT user_id,hotel_id, min(review_date) AS review_date  FROM repatronage_review_assignment WHERE

user_id IN

(SELECT user_id FROM (SELECT user_id, hotel_id, COUNT(*) fre FROM repatronage_review_assignment GROUP BY user_id, hotel_id HAVING fre > 1) c)

GROUP BY user_id,hotel_id) a)

AND user_id IN

(SELECT user_id FROM (SELECT user_id, hotel_id, COUNT(*) fre FROM repatronage_review_assignment GROUP BY user_id, hotel_id HAVING fre > 1) d)

GROUP BY user_id,hotel_id);

Based on the new columns created for question 6.2, it is easy to compute the difference between two lodging dates for each hotel.

SELECT hotel_id, AVG(timeinterval) FROM

(SELECT a.user_id, a.hotel_id, a.review_date AS RD1, b.review_date AS RD2, DATEDIFF(b.review_date, a.review_date) as timeinterval from (SELECT user_id, hotel_id,review_date FROM repatronage_review_assignment WHERE orders = '1st') a

JOIN (SELECT user_id, hotel_id,review_date FROM repatronage_review_assignment WHERE orders = '2nd') b ON a.user_id = b.user_id AND a.hotel_id = b.hotel_id) c GROUP BY hotel_id